Using machine learning to identify factors that govern amorphization of irradiated pyrochlores

Structure-property relationships is a key materials science concept that enables the design of new materials. In the case of materials for application in radiation environments, correlating radiation tolerance with fundamental structural features of a material enables materials discovery. Here, we use a machine learning model to examine the factors that govern amorphization resistance in the complex oxide pyrochlore ($A_2B_2$O$_7$). We examine the fidelity of predictions based on cation radii and electronegativities, the oxygen positional parameter, and the energetics of disordering and amorphizing the material. No one factor alone adequately predicts amorphization resistance. We find that, when multiple families of pyrochlores (with different B cations) are considered, radii and electronegativities provide the best prediction but when the machine learning model is restricted to only the $B$=Ti pyrochlores, the energetics of disordering and amorphization are optimal. This work provides new insight into the factors that govern the amorphization susceptibility and highlights the ability of machine learning approaches to generate that insight.

Designing materials for advanced or next-generation applications requires understanding of how properties are related to structure, thatis, identifying so-called structure-property relationships. Having such relationships guides the search for new materials with enhanced performance by identifying regions of structure and composition space that exhibit superior properties. For nuclear energy materials, a key performance metric is tolerance against radiation damage. Pyrochlores (A 2 B 2 O 7 ) have been extensively studied for their potential application as nuclear waste forms [1-10] and have been incorporated into some compositions of the SYNROC waste form [11]. In this context, significant effort has been directed toward understanding how the chemistry of the pyrochlore -the nature of the A and B cationsdictates the amorphization susceptibility of the compound. In particular, several experimental efforts [12][13][14][15][16] have been focused on determining the critical amorphization temperature, T C , the temperature at which the material recovery rate is equal to or faster than the rate of damage, as summarized in Fig. 1. Typically, these experiments were performed in an electron microscope equipped with an ion source, such that samples were simultaneously irradiated with electrons and 1 MeV Kr ions. Though the value of T C is expected to vary depending on ion irradiation conditions [17], 1 MeV Kr ion irradiation results should be comparable.
As a consequence, a number of "features" -or basic structural and energetic properties -have been identified that provide insight into the radiation response of pyrochlores. These include the radii and electronegativities of the A and B cations [8,13]; the x parameter, which describes how the oxygen sublattice deviates from ideality [4,8,13]; the enthalpy of formation of the pyrochlore [6,18]; and the energy to disorder the pyrochlore to a disordered fluorite structure [1,19]. Further, there has been discussion on the extent of the disordered phase field in the phase diagram and its relationship to amorphization resistance [7]. Most of these features have been only heuristically correlated with amorphization resistance or only applied to a subset of pyrochlore chemistries. We are only aware of one attempt to quantify the relationship between these types of features and a prediction of T C . In that work, Lumpkin and co-workers established a relationship between T C and lattice constants, electronegativies, disordering energetics, and oxygen positional parameter [8]. While their model provided a significant advance in describing the structure-property relationships of pyrochlores, here we demonstrate how, through the use of machine learning, greater insight can be extracted. In particular, while they considered the disordering energy as one of their Experimentally measured values of T C , ordered as a function of A cation radius, for several different pyrochlores.
features, they used data from atomistic potentials that does not adequately describe all of the chemistries in the experiments. Further, they did not have access to data describing the amorphous state of these compounds. Finally, modern machine learning methods, applied to materials science, offer new avenues to examine the structure-property relationships in these types of systems.
Here, we use machine learning methods to demonstrate how a set of features, for a range of pyrochlore chemistries, can be used to predict T C . We use both structural parameters such as cation radius and electronegativity supplemented by energetics calculated with density functional theory (DFT) to build a database of features as a function of pyrochlore chemistry.
We analyze this database, building machine learning models that predict T C as a function of pyrochlore chemistry based on a systematic collection of features. We consider pyrochlore chemistries for which experimental data exists for T C , which includes pyrochlores where B=Ti, Zr, Hf, and Sn. We find that, when considering the full range of chemistries, the two features that best predict T C are the ratio of the radii and the difference in electronegativities of the A and B cations. However, to predict more subtle dependencies of T C with pyrochlore chemistry characteristic of a given B chemistry, the energies to disorder and amorphize the compound provide a better prediction of T C .
As compared to Ti, Hf, or Zr, Sn is a chemically very different element. It, like Ti, is multivalent, but unlike Ti, has a much stronger prevalence to adopt a charge state other than 4+. Further, as discussed below, it has a significantly higher electronegativity than the other B cations, producing a more covalent bond. This implies that Sn pyrochlores should be less amorphization resistant [20]. However, experiments have shown Sn pyrochlores to be more amorphization resistant than other pyrochlores [5]. This all suggests that Sn pyrochlores are electronically much more complex than the other pyrochlore families, which is one reason that we use DFT to determine the energetics of disordering and amorphization, as DFT can account for the varied valence of the Sn cations. Further, the inclusion of Sn pyrochlores in this analysis, precisely because the behavior is counter-intuitive, provides a more stringent test of the methodology.

DFT Energetics
Figure 2a provides the energetics for disorder and amorphization of a given pyrochlore, as found using DFT, as a function of the chemistry of the pyrochlore. These are ordered by A cation radius. Focusing first on the energetics to disorder, there is a general trend that as the A cation radius increases, the energy associated with disordering the pyrochlore to a disordered fluorite also increases, consistent with previous results using DFT [21]. This is particularly true of the B=Zr, Hf and Sn families of pyrochlores. For the B=Ti family, there is a peak in the disorder energy for the A=Gd composition, again consistent with previous DFT and empirical potential calculations [19,21]. Ti pyrochlores (which they do) but also that Sn pyrochlores would be less resistance to amorphization than Ti pyrochlores, which they are not. Thus, other factors must also be important. We propose that the energy of the amorphous phase is one of those factors.
The energy differences between ordered pyrochlore and an amorphous structure are also provided in Fig. 2a. In the case of the B=Hf and Zr families, these are again relatively monotonic with increasing A cation radius. However, the behavior of the B=Ti and Sn families is more complex. In particular, for the B=Ti family, the amorphous energy is non-monotonic with A cation radius, but the peak is for a different chemistry than was the disordering energy. In the B=Ti family, the amorphous energy is greatest for A=Y and generally is high for A=Dy and Tb. The B=Sn family exhibits even more complicated behavior. There is a peak in the amorphous energy for A=Gd and a minimum for A=Ho.
Finally, the shaded regions in Fig. 2a

Correlation of Features with Amorphization Resistance
The DFT results reveal that there are significant differences in the energetics of disorder and amorphization in pyrochlores as a function of both A and B chemistry. We use a machine The shaded regions highlight the differences between the disordered and amorphous structures.
learning approach to quantify the correlations between these energetics, as well as other features associated with pyrochlores, and the amorphization resistance, as characterized by The features considered here are r A /r B , the ratio of the ionic radii of the A and B cations; ∆X = X B − X A , the difference in electronegativity of the A and B neutral metal atoms (X A and X B , respectively); x, the oxygen positional parameter, which measures the deviation of the oxygen sublattice from an ideal (fluorite-like) simple cubic sutlattice; E O→D , the energy difference between the disordered and ordered phases; and E D→A , the energy difference between the amorphous and disordered phases. These features were chosen because (a) they have been shown to correlate to some degree in previous studies and (b) our DFT results indicate that the energetics depend strongly on the A and B chemistry of the pyrochlore, suggesting they may provide a strong descriptor of each compound. We did not consider the enthalpy of formation, proposed by other authors as a factor in radiation tolerance [6,18], as a feature because data was not available for all compounds.
However, before we examine the results of the machine learning model, it is instructive to examine how the selected features correlate with T C . Figure 3 provides simple plots of each feature against T C . The values for T C , summarized in Table S1, are taken from Refs. [12,13,15,16]. shows an overall correlation with T C but again the details are lost. E O→D , on the other hand, seems to correlate reasonably well for pyrochlores within a given family but does not describe variations of T C between families. Finally, E O→A , similar to x and ∆X, seems to generally correlate separately for B=Sn pyrochlores and the other families of pyrochlores.
Thus, while there are rough trends indicating some insight from each of these features, there is certainly not enough of a correlation in any case for a quantitative prediction. However, this suggests, as noted by other authors [8], that combinations of these features may provide predictive capability. Hence, we use a machine learning approach to quantify this.

Results of the Machine Learning Model
We use a machine learning (ML) approach to quantify the correlations between the five features described in the previous section and T C . More specifically, we employed kernel ridge regression (KRR) [22][23][24]-an algorithm that works on the principle of similarity and is capable of extracting complex non-linear relationships from data in an efficient manner-with a Gaussian kernel to learn and quantify trends exhibited by T C in the feature space discussed above. A randomly selected 90%/10% training/test split of the available data was used for statistical learning and testing the performance of the trained model on previously unseen data. A leave-one-out cross validation is used to determine the model hyper-parameters to avoid any overfitting of the training data that may lead to poor generalizability. The trained model can subsequently be used to make an interpolative prediction of T C for a new Next, within the KRR ML model, we aim to identify the best feature combination that exhibits highest prediction performance, quantified by its ability to accurately predict T C of the test set compounds. We do this in a comprehensive manner by building KRR ML models using all possible combinations of Ω features with Ω ∈ [2,5]. Performance of each of these models was evaluated separately on the entire data set as well as on a reduced set that only included the Ti pyrochlores. The root mean square (rms) errors for the T C predictions on training and test sets for various models is presented in Fig. 4. In order to account for model prediction variability associated with randomly selected training/test splits, Fig. 4 reports the rms errors averaged over 100 different randomly selected training/test splits for each of the models. The 2D models that lead to the lowest rms errors on the test set data have been marked with a ' ' in Fig. 4a (when taking the entire data) and Fig. 4b   built on a lower dimensional feature set) should always be preferred over a more complex one. Therefore, henceforth we focus our attention on the the best performing 2D models.
The superior performance exhibited by the (r A /r B , ∆X) feature pair is not entirely unexpected and can be understood by looking at Fig. 3b and e. As alluded to previously, while r A /r B helps capture the overall T C trends among different chemistries, ∆X allows for an effective separation between different chemistries (especially, between the Sn-based compounds and rest of the dataset), while still capturing relative T C trends between these subgroups. The best performing feature pair for the titanate pyrochlores dataset, however, is constituted by E O→D and E D→A . While the (r A /r B , ∆X) feature pair performs much poorer on this subset than the overall dataset, the performance of (r A /r B , E O→D ) feature pair is also found comparable to that of the best 2D feature pair.
While Fig. 4 captures the average performance and variability (taken over 100 different runs) for our best performing 2D models (marked with a ), in Fig. 5a-b we present parity plots comparing the experimental T C with the ML predictions using the best 2D descriptors found for the entire dataset (Fig. 5a) and the titanates (Fig. 5b)  in T C versus the feature values and make predictions of T C for new chemistries. Figure 5c shows the best two-feature descriptor for the entire set of pyrochlores considered.
Again, in this case, the two features that best correlate with T C are r A /r B and ∆X. This combination of features is able to distinguish the different T C behavior exhibited by the B=Sn pyrochlores and the other families of pyrochlores, by virtue of the properties of ∆X.
However, as discussed above, this combination of features has an effective uncertainty of ∼ 100 K, indicating that it cannot describe the fine features exhibited by the B=Ti family of pyrochlores. For example, T C is not monotonic with A cation radius (see Fig. 1). As discussed, limiting the model to just the B=Ti pyrochlores results in a different optimal two-feature set, namely E O→D and E O→A , as shown in Fig. 4b. In particular, as shown in Fig. 5d, this set of features can describe the subtle behavior in which the A=Gd compound has the highest value of T C , correlating with the fact that it has the highest value of E O→D , while the A=Y compound, which has values of E O→D similar to the neighboring compounds, exhibits an anomalously low value of T C . This is a consequence of its rather high value of E D→A , a consequence of the fact that Y is not a rare earth and thus the bonding associated with it is subtly different to the other elements around it.

DISCUSSION AND CONCLUSIONS
Combining experimental results for T C for various pyrochlore compounds, DFT calculations of the energetics of disordering and amorphization, and a machine learning model, we conclude that (a) basic ionic properties such as r A /r B and ∆X have the qualitative capability of predicting trends in T C over a wide-range of pyrochlore compounds but that While the feature set of ∆X and r A /r B have the best predictive capability for distinguishing between the various families of pyrochlores, the reason why Sn pyrochlores are radiation tolerant while exhibiting such high disordering energies is found in examining the amorphization energetics. The gap between the disordering and amorphization energies for the Sn pyrochlores is typically quite large and even if, during the course of irradiation, enough energy is deposited into the lattice such that the structure becomes disordered, it is not enough to amorphize the material. The gap betwen the disordering and amorphization energies is much larger in the Sn pyrochlores than it is in the Ti family and, for some A cations, larger than for the Hf and Zr families as well. Thus, the origin of the radiation tolerance of some of the Sn pyrochlores comes from the fact that they are extremely difficult to amorphize.
The insights gained by the machine learning model apply specifically to pyrochlores and, because of the interpolative nature of these models, to the families of pyrochlores considered here. That said, the features identified as being best able to predict T C can be justified physically and thus may be applicable to other classes of complex oxides, such as δ-phase [25], is not possible in pyrochlore [26]. Further, other factors, such as short-range order, which is known to occur in complex oxides [27,28], may also play a role. However, we suspect that treating the disordered state as truly random captures much of the behavior of these materials, given the ability of the disordered fluorite structure to predict order-disorder temperatures in these systems [21,29].
In this work, we have used T C as a metric for relative amorphization resistance. In reality, the value of T C encompasses not only thermodynamic properties such as disordering and amorphization energetics, but also kinetic processes of defect annihilation and defect production. Thus, actually predicting T C from fundamental defect behavior would be a daunting task. However, it does provide a metric to compare the susceptibility of amorphization that has been measured for a range of pyrochlore chemistries.
Finally, this work highlights the utility of machine learning approaches in materials science. In this case, the ML model elucidates those features which provide predictive capability, providing insight into those factors which dictate amorphization resistance in pyrochlores. The model also shows that sets of two features result in optimal predictions; higher-order feature sets do not add significant value. The fact that different combinations of features provide are optimal for predictions for the entire set of pyrochlores (r A /r B and ∆X) versus the Ti family (E O→D and E O→A ) reinforces the point that the best set of features depends on the level of detail (here, the error in the predicted T C ) required in the prediction.

Density Functional Theory
Density functional theory (DFT) calculations were performed using the all-electron projector augmented wave method [30] within the local density approximation (PBE) with the VASP code [31]. A plane-wave cutoff of 400 eV and dense k-point meshes were used to ensure convergence. The lattice parameters and all atomic positions were allowed to relax, though the cells were constrained to be cubic. The disordered fluorite structure was modeled using the special quasirandom structures (SQS) approach [32]. The SQS structures were generated as described in Ref. [21]. The amorphous structures were created by performing ab initio molecular dynamics at a very high temperature and then quenching the structures to 0 K.
For the B=Zr and Hf families, there is a deviation from true monotonic behavior at A=Tb, in contrast with previous DFT calculations [21] that used the same methodology (pseudopotentials, functional, k-point mesh, and energy cutoff). We assume that the differences from previously published results are due to changes in different versions of VASP.

Machine Learning Model
We used Kernel ridge regression (KRR) with a Gaussian kernel for machine learning.
KRR is a similarity-based learning algorithm, where the ML estimate of a target property (in our case the critical temperature T C ) of a new system j, is estimated by a sum of weighted kernel functions (i.e., Gaussians) over the entire training set, as where i runs over the systems in the training dataset, and |d ij | 2 = ||d i −d j || 2 2 , the squared Euclidean distance between the feature vectors d i and d j . The coefficients w i s are obtained from the training (or learning) process built on minimizing the expression