Machine‐Learning Clustering Technique Applied to Powder X‐Ray Diffraction Patterns to Distinguish Compositions of ThMn
 12
 ‐Type Alloys

We applied the clustering technique using DTW (dynamic time wrapping) analysis to XRD (X-ray diffraction) spectrum patterns in order to identify the microscopic structures of substituents introduced in the main phase of magnetic alloys. The clustering is found to perform well to identify the concentrations of the substituents with successful rates (around 90%). The sufficient performance is attributed to the nature of DTW processing to filter out irrelevant informations such as the peak intensities (due to the incontrollability of diffraction conditions in polycrystalline samples) and the uniform shift of peak positions (due to the thermal expansions of lattices).


Introduction
The concept of materials informatics based on big data science has attracted recent interest in the context of discovering and exploring novel materials. [1] Achieving high efficiency to acquire data, namely experimental measurements and analysis of materials, is necessary to accelerate the cycle of the exploration of materials. X-ray diffraction (XRD) analysis is quite commonly used to determine crystal structure dominating material properties. [2] The analysis has been accelerated by the increase in X-ray intensities as well as in the measurement environment. [3] Typical efforts to achieve analysis efficiency include such studies applying machine learning techniques to a series of XRD data in a systematic observation (e.g., dependency on concentration, temperature etc.) to extract significant information. [4] Figure 1. The tetragonal (Imm2) crystal structure of SmFe 11 Ti. Note that the labels are Wyckoff sites of space group before substitution by Ti (I4∕mmm).
While materials informatics approaches combined with XRD data have been recently used to distinguish different phases (i.e., inter-phase identifications), [5][6][7][8][9][10][11] no attempt has been made to address intra-phase phases thus far. The present study aims to provide a framework for predicting the concentrations of atomic substituents introduced into the main phase of polycrystalline magnetic alloys.
The ThMn 12 -type (Figure 1) crystal structured SmFe 11−x Ti x has been regarded as one of the candidates for the main phase in rareearth permanent magnets. [12] The origin of intrinsic properties emerging at high temperature as well as that of the phase stability has not yet been clarified. Introducing Ti and Zr to substitute for Fe and Sm is found to improve the magnetic properties and the phase stability, as described in details in the Sec. Samples and experiments. To clarify the mechanism of how the substitutions improve the properties, it is desirable to identify and quantify substituted sites, preferably with high throughput efficiency to accelerate materials tuning. In this work, we have developed a machine learning clustering technique to distinguish powder XRD patterns and obtain microscopic identification of the atomic substitutions.
Ab initio calculations are used to generate supervising references for the machine learning of XRD patterns: We prepared several possible model structures with substituents located on different sites over a range of substitution fractions. Geometrical optimizations for each model results in slightly different structures. Then we generated many XRD patterns calculated from each structure. We found that dynamic-time-wrapping (DTW) analysis can capture slight shifts in XRD peak positions corresponding to the differences of each relaxed structure, distinguishing the fractions and positions of substituents. We established the clustering technique using Ward's analysis in addition to the DTW, as being capable of sorting out simulated XRD patterns based on the distinction.
The established technique can hence learn the correspondence between XRD peak shifts and microscopic structures with substitutions over many supervising simulated data. Since the ab initio simulation can also determine several properties such as magnetization for each structure, the correspondence in the machine learning can further predict functional properties of materials when it is applied to the experimental XRD patterns, in addition to distinguishing atomic substitutions. The machine learning technique for XRD patterns developed here therefore has a broader range of applications that is not limited only to magnetic materials, but further to those materials with properties tuned by atomic substitution.

Results
For our target system, [Sm (1−y) Zr y ] Fe 12−x Ti x , we examined the range for x and y as shown in Table 1, which is experimentally accessible. For a given concentration, several possible configurations for substituents exist. They are sorted into identical subgroups in terms of the crystalline symmetries, as described in the Experimental Section. Tables 2 and 3 summarize the possible space groups of substituted alloy structures (used as initial structures for computations) for given concentrations of Sm/Zr and Fe/Ti, respectively. After applying lattice relaxations to the initial structures achieved by ab initio geometrical optimizations, we can calculate XRD patterns for the lattices. The procedure for the above are given in detail in the Experimental Section. We could therefore generate simulated XRD patterns as above, that is, 26 patterns for the Sm/Zr substitution, to be used as data for the clustering by unsupervised learning. We examine whether clustering can sort the patterns again correctly based on concentration.
The resultant XRD patterns (simulated) coincide fairly well with the experimental pattern, as shown in Figure 2. We see that Table 3. Space group of SmFe 12−x Ti x with inequivalent sites of Ti substitutions. SmFe 12 (I4∕mmm) is used as an initial structure. The number given in parenthesis represents the number of degenerate configurations within each symmetry of the initial structures for further lattice relaxations. Therefore there are 124 configurations in total for generating simulated XRD patterns.
x (space group/number of configurations) the patterns have almost the same overall shape, with slight variations in the inter-peak distances depending on the substituent concentrations. DTW is expected to perform well to capture such slight variations, because: The method is designed to be applied to such signals given along an axis (e.g., time dependent signal, y(t)) so that it can extract only the shape of the signal ignoring uniform shifts along the axis. The method scores the dissimilarity between signals, i and j, in terms of the DTW-distance, DTW(i, j). A clustering framework is generally specified as a combination of methods, a ⊗ b, where "a/scores the dissimilarity," and "b/links elements to form clusters based on the given dissimilarity." In the present work, we employed the framework, [normalized constrained-DTW (NC-DTW)] ⊗ [Ward linkage method], using the NC-DTW and Ward method implemented in the "Scipy package." [13] Descriptions of linkage and dissimilarity-measure methods used in this work can be found in the Scipy documents, except for the DTW dissimilarity measures which were calculated by the fastDTW [14] package. The framework is found to achieve clustering to distinguish the concentration of Sm/Zr substitutions with sufficiently high accuracy, 96.2% (one failure among 26 XRD patterns), as shown in Figure 3.
As the terming of machine-learning, we note that learning part is not for DTW (a) but for the clustering part (b), which is a form of unsupervised learning. It is an important factor, in general, how many data are used for the leaning, which is 124 (26) for Fe/Ti (Sm/Zr) substitutions, all used as the training data for the clustering as the normal terminology in the machinelearning. The number is small from the standard viewpoint of the machine-learning, but it is a common problem for Materials Informatics where each data is too expensive (taken by expensive and laboring experimental observations) to get larger number of data for the learning. One of the impact of the present work would be the fact that the clustering works fairly well even with such a small number of data since the DTW (a)-a distance measure used in the present clustering algorithm (b)-plays a great role in the feature space as a mapping of two different descriptors belonging to different substitution groups. That is, the DTW (a) was found to successfully capture similarity/dissimilarity between the patterns belonging to the same/different groups. An XRD patttern forms a 12 000-dim. numerical data, {I(2 j )} 12,000 j=1 (2 = 0-120 deg., = 0.01 deg.), as described in the Experimental Section in detail.

Limitation of DTW-Dissimilarity
When the same method (Ward ⊗ DTW) as in the Sm/Zr case is applied to Fe/Ti, the success rate for the recognition is reduced to 33.1%. We identify the reason for the success rate for the Fe/Ti being worse than that of Sm/Zr from the dependence shown in Figure 4. Since XRD reflects lattice constants as its peak position, we can take the unit cell volume, v, as a representative quantity to be captured by the clustering recognition in a situation where the cell symmetry is kept unchanged. DTW dissimilarity, DTW(i, j), can then be regarded to scale approximately to the difference of v. The recognition can therefore be regarded as a framework to perform an inverse inference from the "difference of v" to identify the "difference of x" on the dependence of v(x), as shown in Figure 4. For Sm/Zr, "trace back mapping" from v to x is one-to-one, while for Fe/Ti this is not the case due to the degeneracy of v (many different values of v share the same x). Under such degeneracy, it is impossible to provide correct inferences of "difference in x" from a given "difference in v." This difficulty leads to the reduced success rate of Fe/Ti clustering recognition.
The problem can be resolved by exploiting the advantage of ab initio methods that they can provide several other quantities in addition to the optimized lattice parameters. Even when , other quantities such as the magnetization M(v) may be non-degenerate (as shown in Figure 5) and hence represent a means to solve the degeneracy. Using magnetization is especially practical because both experimental and simulated values are available. We also note that the dependence in Figure 5   (left panel) is consistent with the experimental finding [15] that the magnetization per unit volume increases as the Zr concentration increases. By using a weight, 2 , we can revise the dissimilarity as so that it can prevent from the problem due to the degeneracy. We have confirmed that the success rate is actually improved from 33.1% to 99.19%, as shown in Figure 6, by using the magnetization as a weighting.

How to Treat Experimental XRD
As shown in Figure 2, simulated XRD patterns (s) reproduce the experimental ones (e) well. The consistency is sufficient that direct   comparison to evaluate the DTW distance, DTW(e, s), is appropriate for the clustering (it is usual to apply some pre-processing for law data ("e" or "s") to determine corrections ("ẽ" or "s") to evaluate DTW(ẽ,s) in order to compensate for the difference between the idealized simulation and reality). By preparing simulated XRDs, ({s j } N j=1 ), in advance, we can identify such a s k for a given e which gives the smallest distance, |e − s k |. The simulated s k is accompanied by several quantities, {q }, such as the formation energy, the magnetization, and the local geometrical configuration of substituents, evaluated by the ab initio method. The {q } can be the theoretical predictions for the observed e, serving as a machine-leaning framework for XRD patterns assisted by ab initio simulations. Figure 7, shows that such a distance, |e − s k |, works fairly well, for an example of e at a composition Sm 1.05 Zr 0.0 Fe 10.75 Ti 1.25 . For the general composition, Sm c 1 Zr c 2 Fe c 3 Ti c 4 , we can define the composition similarity between e and {s j } as, In Figure 7, we see that DTW(e, s j ) (ordinate) correlates well with the 'composition similarity'. The closest s k giving the shortest DTW(e, s k ) (black filled circle in the figure) has the closest composition to Sm 1.0 Zr 0.0 Fe 11.0 Ti 1.0 than the other {s j }. The prediction accuracy is improved by the increased number of simulation data, ({s j } N j=1 ). A straightforward way to increase the amount of simulation data is to use a more dense grid on x but this requires a larger supercell and hence more computational power. For the present grid resolution, providing the best performance possible, the experimental XRD with Zr% = 0.0, 10.4, and 31.8 are identified to be the closest to Zr% (simulated) = 0, 25, and 37.5, respectively.
In the case with the degeneracy (Figure 4 for Fe/Ti substitution), the DTW distance is not capable of performing the clustering for {s j }, and hence quite unlikely to be capable of identifying the closest s k for a given e based on the |e − s k |. The strategy of weighting by W(i, j) (the magnetization) introduced in the previous section will not work in this case because for e (experimental XRD patterns), the magnetizations are not always available. A possible solution to distinguish e would be to use a plucked set,Ã ⊂ A = {s j }, as follows: Since A is generated by simulations, each element is accompanied with the quantities like the magnetization, the formation energy etc. By referencing the formation energies, we can pluck the degenerate candidates (e.g., P and Q in Figure 8) by excluding those with higher energies (P in Figure 8) to form the plucked subsetÃ. The degeneracy inÃ is now excluded, and hence it is used as a pool of references to be identified as the closest s k to a given e based on the DTW distance, |e − s k |. The identified s k is accompanied by the physical quantities evaluated by the simulations, and hence they can be used as estimates for the sample corresponding to the experimental XRD, e.  Figure 4. When a value is chosen on the vertical axis (blue arrow), corresponding to that of an XRD pattern, two corresponding points (P and Q) intersect with it, representing two patterns with almost the same lattice parameters but with different internal defect alignments, leading to the difficulty that it is not possible to unambiguously distinguish between [1] and [2] as the possible explanation variable (concentration in this case). Red arrows beside the errorbar-like symbols indicate that there is a weight, such as the formation energy, that can be applied over the spreading range.

Significance of Using DTW
The clustering package Scipy [13] used here includes several alternative algorithms to our choice of DTW ⊗ Ward. It is interesting to compare their respective performances, as shown in Tables 4 and 5 with detailed explanations given in § 5.D. Although the NC-DTW does not exhibit the best performance, a compromise is made because the peak-shifting properties of NC-DTW are required to treat the experimental data. We note that the simulated XRDs are reflecting structures at zero temperature while the experimental ones are subject to the thermal effects of finite temperature. Thermal effects lead to the broadening of peaks due to thermal vibrations as well as peak shifts due to thermal expansion. Since we are interested in the change within a phase (not inter-phase changes), the shifts are expected to be almost uniform, not to modify the inter-peak distances significantly because expansion occurs almost evenly for every degree of freedom of the lattice. Such uniform shifts are not detected by DTW by design, and hence, the scoring is not affected by thermal effects. This feature results in robustness against thermal noise in the experimental data, enabling direct comparison with simulation data at 0K to evaluate |e − s k |. Based on the above observations, we deliberately chose DTW even though it does not achieve the best performance for simulated data as shown in Tables 4 and 5. Evidence for this argument has also been shown in the preceding study [7] in which NC-DTW exhibits the best performance, among various techniques, to sort the various phases of experimental data. As we addressed earlier, the appropriateness of using this method lies in its characteristic of being insensitive to parallel shifts and therefore sensitive to sensing only differences in the distance between peaks. If other representative methods in supervised machine-learning, such as SVM (Support Vector Machine) or ANN (Artificial Neural Network), are used to distinguish only the inter-peak distances, there would be no reason to conclude their performance being negative except that they require much larger number of training data than those used here, ∼150. We note that the DTW itself is not the machinelearning technique though the total framework here combined with clustering ones is so. In this context, DTW is regarded as an excellent mapping (distance measure) between descriptors in the feature space characterizing the XRD shape. This is the reason for our successful achievement here even with such a little number of samples as ∼150. In this sense, it is likely to get successful results when the DTW is applied to other supervised methods. For example, the DTW can be implemented in a SVM kernel function, though larger amount of data may be required.
Several works are known in which DTW is applied to analyze XRD patterns. [7,16] While these studies applied DTW to distinguish phases (i.e., inter-phase works), the present study applies it to intra-phase identification. In the former works, DTW is used to distinguish major differences of peak positions occurring when the phase changes. [7] In this study, on the other hand, we demonstrated a new capability of DTW, namely, it can distinguish even the far smaller changes of inter-peak distances that occur within a target phase. Using this capability, we can explore a new framework that enables identification of the microscopic geometries of the substituents introduced in a target phase assisted by machine learning technique.

Further Discussions
It is required to point out possible factors leading discrepancies for the predictions by the present scheme when it is applied to other general cases. In the present case, we could enjoy the good fortune of the availability of computational conditions for DFT, especially on the choices of XC and pseudo potentials, those well describe experiments to create the reference data. Such a fortune is, however, not expected generally. It is well known that the DFT usually overestimates or underestimates lattice constants, depending on the conditions. In such cases, the prediction of the inter-peak distance gets to be incorrect, so the reference XRD patterns generated by the DFT calculation cannot be used as it is. Although the topological order for the peak appearances are identical, the inter-peak distances gets incorrect when compared to experiments. However, such a remedy would be expected to work to some extent that scales the horizontal axis of 2 properly so that the peak positions can be adjusted to get coincident to experimental ones. A couple of experimental XRD data as references could make it possible for such an adjustment scheme by using, for example, the least-square optimization to determine the optimal rate for the scaling.
Another ingredient for the discrepancy is that the experimental data may be affected by several factors not taken into account in the simulations of ideal crystals, such as thermal expansions as we mentioned before. The effects of grain boundaries in poly-crystal samples would lead to the broadening of the peaks. Since the performance of DTW analysis relies on how the peaks are clearly recognizable, such a broadening would be taken as a serious problem leading inaccurate predictions of the framework. However, this might be not the case because of the property of DTW, that is, the insensitivity to the horizontal shifts of signals: As far as the peaks are recognizable (otherwise, XRD analysis does not work in the first place), we can make them sharper by making the axis be contracted. Though the inter-peak distances are also modified by the contraction, the temporary shortening of distances is uniform and hence does not matter when we measure the similarity between patterns, for all of which we could apply the common rate of contraction. The contraction also introduces the shift of peak positions, but it is not detected by DTW because of the insensitivity to the horizontal shift.
We should as well address here the limitation of the applicability of the present framework. XRD simulations using DFT packages are getting quite common even for the researchers who are not mainly working on simulations but experimental and industrial domains. However, there are at least two factors we can point out that may still be limiting the applicability for the broader range of researchers. The first one is the computational costs for handling the models of doped systems which are the target problem of the present topic. To describe the doped systems by the conventional DFT, supercell structures with N-times enlarged unit-cells than those of pristine systems are required, for which the cost gets to be ∼ N 3 times expensive. Not only for the computational costs, one has to consider vast numbers of possible configurations of the doped sites even under a fixed concentration, which sometimes gets beyond the feasible capacity of avaiable memories. [17] Another factor is the difficulty in the choice of DFT computational conditions as we addressed in the previous paragraph. We have to note that the present problem corresponds to a quite rare case where the conventional GGA works fairly well to capture the structure. It is more in general for such systems including d and f elements that careful choices or adjustments are required for XC such as DFT+U etc. In such cases, generating XRD references using DFT itself requires specialized knowledge about XC.

Conclusion
We have developed a clustering framework that can be applied to XRD patterns of alloys to distinguish the concentrations of the substituents. We found that the clustering works quite well to identify the concentrations when applied to the patterns of magnetic alloys based on SmFe 12 . Supercell models for the substitutions are found to work well with ab initio lattice relaxations, reproducing XRD patterns sufficiently well to coincide with experiment. The implementation of the clustering with [DTW dissimilarity scoring] ⊗ [Ward linkage method] is found to achieve a success rate of around 90% for determining substituent concentrations. The main reason for failure of clustering is identified as degeneracy, namely the situation where different concentrations result in almost the same lattice constant. By imposing quantities predicted by ab initio methods into the weighting function used for the dissimilarity scoring, such degeneracies are avoided to prevent clustering failure. Sufficiently good coincidence between simulated and experimental XRD patterns enables the framework to be used to predict unknown concentrations of the substituent introduced in the main phase of alloys from their XRD patterns. The established framework is applicable not only to the system treated in this work, but widely to systems, the properties of which, are to be tuned by atomic substitutions within a phase. The framework has a larger potential to predict properties from observed XRD patterns, not only concentrations, by predicting properties including magnetic moments, optical spectra etc. evaluated from predicted microscopic local structure (positions of substitutions etc.).

Experimental Section
Samples and Experiments: The X-ray diffraction (XRD) measurements for the powdered Sm-Fe-Ti were performed at beamline BL02B2 of SPring-8 (Proposal Nos. 2016B1618 and 2017A1602). The diffraction pattern of CeO 2 was used to determine the X-ray energy of 25 keV. The diffraction intensities were collected using a sample rotator system and a highresolution one-dimensional semiconductor detector (multiple MYTHEN system) with a step size of 2 = 0.006 • . [3] The samples were powdered from strip-cast Alloys, and the powder was put into a quartz capillary and encapsulated with a negative pressure of Ar gas.
Computational Details: To determine the structures of the target alloys, [Sm (1−y) Zr y ] Fe 12−x Ti x , constructed a tetragonal (I4∕mmm) crystal structure of SmFe 12 was constructed using experimental lattice parameters, a = 0.856 nm and b = 0.480 nm (b = c), of SmFe 11 Ti [18] as an initial setting for further optimization. For Zr-substitutions replacing Sm sites (ranging from 1-4 atoms), a 2 × 2 × 2 supercell, containing 104 atoms in the primitive of tetragonal (Imm2) of SmFe 11 Ti (Figure 1) was constructed. All possible configurations were considered to cover the randomness of experimental substitutions, and some configurations were ignored by considering their symmetry, determined using FINDSYM software. [19] Finally, only 26 supercells ( Table 2) that possessed different space groups, Wyckoff site occupations, were considered.
For ab initio calculations, the spin-polarized density functional theory (DFT) implemented in the Vienna ab initio simulation package (VASP) was used. [20][21][22] For systems such as the target that include transition metals and rare earth elements, it is generally known that the predictions www.advancedsciencenews.com www.advtheorysimul.com are critically influenced by the choice of exchange-correlation (XC) potentials used in DFT. [23][24][25][26] For the present case, it has been found that DFT+U is essentially inevitable if f -orbitals are treated as the valence range. [27][28][29][30][31] It has also been found that the generalized gradient approximation (GGA) works well if the 4f is treated as the core range described by pseudo potentials. [32][33][34][35][36] The revised Perdew-Burke-Erzerhof (RPBE) [37] functional for the GGA-XC was therefore used upon confirmation that the RPBE improves the optimized lattice parameters by resulting in values closer to experimental values than the PBE. [38] The pseudopotentials based on the projected augmented wave (PAW) [39] method were used. The s and p semi-core states are included in the valence states, except for Sm, resulting in 12, 16, and 12 valence states for Zr, Fe, and Ti, respectively. The structural relaxations were performed until the force on each ion was smaller than 0.01 eV Å −1 . A plane-wave cutoff energy of 400 eV and a 5 × 5 × 5 Monkhorst-Pack grid was used which was large enough to obtain the convergence energy. Lattice relaxations with the above choices applied to SmFe 11 Ti was confirmed to result in the lowest total energy with Ti at the 8i site, which is consistent with experiment [12,40] and ab initio calculation [41] of RFe 11 Ti-type magnetic compounds. The optimized lattice parameters, a and c, were 0.851 and 0.473 [nm] are in good agreement with experiment. [18] These comparisons confirm that our model is quite working well. With Ti substitution at the 8i site, the I4∕mmm space group breaks and becomes Imm2 as shown in Figure 1.
Validation of Simulated XRD Patterns: Simulated XRD patterns were validated by comparison to experimental XRD patterns. The X-ray diffraction (XRD) patterns of the optimized structures were theoretically calculated by the powder diffraction pattern utility of the software VESTA. [42] The X-ray wavelength of 0.496 Å used in the experiment was used. The isotropic atomic displacement parameter (B) was set to 1.00 Å. Normalized XRD patterns having 2 from 1 to 120 degrees at 0.01 degree intervals were obtained.
The simulated XRD pattern of SmFe 11 Ti agrees very well with the experimental XRD pattern of Sm 1.05 Fe 10.75 Ti 1.25 ( Figure 2); however, the mainphase peak position of 13.41 deg. in the experiment is significantly shifted to 13.48 deg. in the simulation. This is because peak shift occurs if the lattice expands or contracts, and we found that the optimized lattice parameters from DFT are underestimated, accounting for the difference. Underestimated lattice parameters introduce systematic shift of peak positions only, the XRD profiles remaining unchanged. When the Zr concentration increases, the main-phase peak position of the simulated XRD patterns shifts to larger 2 , in accordance with the experimental results.
Hierarchical Clustering Analysis: Hierarchical clustering analysis (HCA) was used to identify the simulated XRD patterns. All clustering analysis was carried out using the Scipy package. [13] The descriptions of linkage and dissimilarity-measure methods being used in this work can be found in the Scipy documents, except for the DTW dissimilarity measures which were calculated by the fastDTW [14] package.
The package provides a variety of other methods than the present choice, DTW ⊗ Ward, as shown in Tables 4 and 5. The tables compare the performance achieved by each method for the identifications of Sm/Zr and Fe/Ti, respectively. The performance is evaluated in terms of the adjusted rand index (ARI), which measures the similarity between the true labels and the predicted labels with maximum and minimum scores of 1 and −1, respectively. The ARI calculations were performed by using the Scikitlearn package. [43] In the tables, several dissimilarity measures, NC-DTW, Euclidean, Cityblock, Cosine, and Correlation, with various linkage methods, Single, Complete, Average, Weighted, Centroid, Median, and Ward, have been compared. The ideal score of 1 is attained by all methods, except NC-DTW, in Sm/Zr structures. While the best method in Fe/Ti structures are cosine or correlation dissimilarity measure with Ward linkage each attaining a score of 0.55. The NC-DTW method provides lower performance than other methods in both structures since NC-DTW omits the peak-shift information while the rest are peak-position based dissimilarity measures. With the NC-DTW dissimilarity measure, the Ward linkage method exhibits good performance, with ARIs of 0.91 and 0.01 for Sm/Zr and Fe/Ti structures, respectively. Therefore, in this work, the focus is on NC-DTW with Ward linkage method.