Structural changes during glass formation extracted by computational homology with machine learning

The structural origin of the slow dynamics in glass formation remains to be understood owing to the subtle structural differences between the liquid and glass states. Even from simulations, where the positions of all atoms are deterministic, it is difficult to extract significant structural components for glass formation. In this study, we have extracted significant local atomic structures from a large number of metallic glass models with different cooling rates by utilising a computational persistent homology method combined with linear machine learning techniques. A drastic change in the extended range atomic structure consisting of 3–9 prism-type atomic clusters, rather than a change in individual atomic clusters, was found during the glass formation. The present method would be helpful towards understanding the hierarchical features of the unique static structure of the glass states. In glass formation, the dynamics of extended structures beyond atomic short-range order is yet to be understood. Here, persistent homology, combined with machine learning, reveals superstructures made of 3-to-9 prism-type atomic clusters which undergo drastic changes according to the glass cooling rate.

G lass states are generally formed owing to an abrupt increase in their viscosity upon cooling from the liquid states [1][2][3][4][5] . It is known that they are far from the equilibrium states owing to their long relaxation times that cannot be achieved experimentally. Naturally, the formation of glass states strongly depends on their thermal history, i.e. the cooling rate from the molten liquid to glass 6 . A glass state formed at a slower cooling rate should be more stable, and therefore, closer to the ideal glass state. In general, there is no distinct difference between the atomic-scale structures in the liquid and glass states where only several broad peaks, which indicate a disordered structure, are found in their structure factors, although the spatial heterogeneity of their dynamics increases during glass formation 7,8 . It is even more difficult to distinguish between the different glass states formed at different cooling rates in terms of their static structures. Therefore, the development of analytical methodologies that can reveal the hidden structural features in disordered states such as liquid and glass is required.
For a long time, the cooling rate dependency of glass structures has primarily been examined using molecular dynamics simulations [9][10][11][12][13][14] . Molecular dynamics simulation is a powerful tool that provides a complete set of information of all the experimentally inaccessible atomic positions. A slight structural difference between the glass states simulated by molecular dynamics with faster and slower cooling rates has been observed through a Voronoi polyhedral analysis 12,13 or a bond-orientational order analysis 14 . In the case of densely packed structures, e.g. a fraction of the icosahedral-like clusters decreases with decreasing cooling rate when using the Lennard-Jones potential 9 , whereas this fraction increases when using the embedded atom method potentials 13 . However, the structure units strongly depend on the glass systems, and therefore, the essential structures for glass formation still remain unclear. Furthermore, it is difficult to elucidate the structural features beyond a short range even when using the pair distribution function (PDF) and Voronoi polyhedral analyses.
A computational persistent homology method has been developed as a tool for analysing the topological features of threedimensional objects in the 2000s 15 . This method is useful for surveying complicated objects such as glass structures and has already been applied to the structural analysis of oxide and metallic glasses [16][17][18] . It is possible to visualise the hierarchical features of glass structures via persistence diagrams by detecting the creation and annihilation of holes with changing atomic size. In this study, we applied the persistent homology technique to the analysis of metallic glass structures prepared at different cooling rates and obtained the persistence diagrams for each model. The statistical features of the persistence diagrams were further analysed using machine learning techniques. Machine learning techniques can be useful to search for and extract important information from persistence diagrams. Both important and unimportant information are often mixed in the diagrams, and it is difficult to intuitively identify the important information from the diagrams. The combination of machine learning and persistent homology provides a systematic way to find important geometric patterns hidden behind the data.
In this work, typical local structures related to glass formation were extracted from molecular dynamics simulations through the inverse analysis of persistence diagrams using a combination of linear machine learning models, such as principal component analysis (PCA) and linear regression analysis (LRA) 19,20 . We demonstrate that the persistence diagram/machine learning analysis is a powerful tool to statistically extract the significant extended range local structures of glasses that strongly depend on cooling rates. Consequently, we suggest the changes in extended range local structures including 3-9 prism-type clusters, rather than individual clusters, play a key role in the glass formation.

Results
Structure models prepared by molecular dynamics simulation. The structure models prepared using molecular dynamics simulations in this study are summarised in Fig. 1. For LRA, a model each for 37 different cooling rates (from 1.7 × 10 10 to 1.7 × 10 14 K s −1 ) was prepared. Separately, 30 models each for five different cooling rates (1.7 × 10 10 , 1.7 × 10 11 , 1.7 × 10 12 , 1.7 × 10 13 and 1.7 × 10 14 K s −1 ) were constructed for PCA. Note that the term "structure model" represents a snapshot in a molecular dynamics trajectory. Each model includes 12,000 atoms in total (9600 Pd atoms and 2400 Si atoms). The structure factors of these molecular dynamics models are consistent with the reported diffraction experiments (Supplementary Fig. S1). For the Pd 80 Si 20 models used in this study, all Si atoms were surrounded by Pd atoms. In other words, there are no nearest neighbour Si-Si bonds, as indicated by the PDF (Supplementary Fig. S2) and Voronoi polyhedral analyses ( Supplementary Fig. S3). The shortrange structures are characterised by Si-centred prism-type clusters indexed by, for example, <0 3 6 0>, <0 4 4 0>, <0 5 4 0>, <0 2 8 0> and <0 3 6 1>, which are consistent with previous studies 21,22 . In order to understand the features of the prism-type clusters and their arrangements, we calculated the persistence diagrams for models excluding all Si atoms or all Pd atoms (Fig. 1c, d).
Persistent homology analyses. Figure 2a, b shows PD2 for Pd atoms, while Fig. 2c, d shows the PD1 for Si atoms, obtained from the molecular dynamics structure models formed at the slowest and fastest cooling rates (1.7 × 10 10 and 1.7 × 10 14 K s −1 , respectively). Here, PD2 and PD1 denote the persistence diagrams for the second and first homologies, respectively. The PD2 provides information about a closed hole usually called a void or pore, whereas the PD1 corresponds to an open hole such as a ring or tube. The details regarding the persistence diagrams are described in the literature 17 . Each persistence diagram is averaged over 30 models, each including 12,000 atoms. The horizontal axis in the persistence diagrams represents the birth time, i.e., when a hole is generated, while the vertical axis represents the death time, i.e., when an identical hole is annihilated. The characteristic islands found in PD2 for Pd atoms (indicated by arrows in Fig. 2a, b) correspond to prism-type atomic clusters, which is demonstrated by the PD2 for individual prism-type atomic clusters shown in Supplementary  Fig. S4. The PD1 for Si atoms (Fig. 2c, d), which represent the spatial arrangements of Si-centred prism clusters, are characterised by two horns perpendicular to the horizontal birth axes. Note that the Si-centred local structures most likely belong to prims-type atomic clusters (65-70% atomic clusters belong to <0 3 6 0>, <0 4 4 0>, <0 5 4 0>, <0 2 8 0>, and <0 3 6 1>) based on the statistics of Voronoi polyhedral analysis ( Supplementary Fig. S3). In addition, it is understood that the surrounding atoms of prismtype atomic clusters are only Pd atoms based on the partial PDFs ( Supplementary Fig. S2) as mentioned above. Thus, the structural features of the prism-type atomic clusters and their spatial arrangements can be basically understood from PD2 for Pd atoms and PD1 for Si atoms, respectively.
Machine learning analyses. To investigate the cooling rate dependence of the persistence diagrams, we first conducted an LRA of the molecular dynamics models prepared at 37 different cooling rates (Fig. 1a). The persistence diagrams obtained from 37 molecular dynamics atomic configurations were calculated and divided into N regions (bins), each containing birth-death pairs corresponding to the detected holes. The numbers of birth-death pairs for N regions were smoothed by a Gaussian filter and then converted into N-dimensional vectors as input data for the LRA.
In this case, the N values for the persistence diagrams of the Pd and Si configurations were approximately 12,000 and 7500, respectively. Linear relationships between the number of birth-death pairs for each region and the cooling rate were modelled using LRA. This analysis enables us to determine the cooling rate dependence of each region in the persistence diagrams. The persistence diagrams indicating degrees of dependence on the cooling rate were finally reconstructed using the learned vectors. Figure 3a, b shows the persistence diagrams reconstructed by the learned vectors of Pd and Si atoms from PD2 and PD1, respectively. The colour levels in the persistence diagrams indicate the magnitude of the influence of the cooling rates. The characteristic structures for the slower and fastercooled models are contained in the darker blue and red regions of the persistence diagrams, respectively. The locations of the blue regions are clearly different from those of the red regions, indicating that the local structures strongly depend on the cooling rate. Similarly, we further performed PCA for 30 molecular dynamics models at five different cooling rates (Fig. 1b). The 30 atomic configurations, each for five different cooling rates, were converted into vectors as input files for PCA. The vectorization procedure was the same as in the case of LRA. The N-dimensional vectors, where N is the number of birth-death pairs for each region, were then analysed by PCA and reduced to   two-dimensional projections with respect to the first and second principal components. Note that the first principal component most properly explains the dispersion of the input data. Eventually, the first principal component becomes strongly dependent on the cooling rate, as mentioned below. The persistence diagrams were also reconstructed using the learned vectors by PCA. Figure 3c, d shows the projections of the persistence diagrams on the first and second principal component planes, respectively. Total 150 data (30 molecular dynamics models × 5 cooling rates) are plotted on the projected planes and different colours are assigned to each cooling rate. Clustering of the data points is seen at each cooling rate, and the clusters are observed to be aligned along the horizontal axis. It is apparent that the first principal components strongly correlate with the cooling rates, while the second principal components are almost independent of them. The learned vectors from the first principal components of PCA were also drawn on the reconstructed persistence diagrams, as shown in Fig. 3e, f. Interestingly, the locations of both the blue and red regions in the persistence diagrams of PCA coincide with those in the persistence diagrams of LRA. Thus, it is reasonable to state that the first principal component of PCA also represents the structures depending on the cooling rates. The molecular dynamics datasets for PCA are much more statistically warranted than those for LRA, and therefore, are more suitable for extracting specific local structures.
In order to understand the origin of the signals in the persistence diagrams, local atomic structures from the persistence diagrams were extracted through an inverse analysis process. The structural information of the original molecular dynamics models is initially recorded to the persistence diagrams together with birth-death pairs when computing persistence diagrams. After the machine learning analyses, a local structure corresponding to each birth-death pair can be mathematically reconstructed from the recorded structural information using the inverse analysis process. Figure 4a, Fig. 3e. The area of each face of the prism-type clusters is shown by a different colour and thus, the degree of distortion in these clusters can be immediately known based on their colour. The statistical distribution of the areas of the faces is also shown in Fig. 4c. The distributions for the area of a triangle face were statistically analysed for 710 and 435 atomic clusters in the slowest-and fastest-cooled samples, respectively. The distribution for the slowest-cooled sample appears to be sharper than that for the fastest-cooled sample, indicating that the prism-type clusters in the fastest-cooled sample are more distorted than those in the slowest-cooled sample, although the difference is not large.
The Si networks related to the framework of the arrangement of the prism-type clusters were also extracted from the reconstructed persistence diagrams of Fig. 3f. The change in the Si network is more prominent than that in individual prism-type atomic clusters. Typical arrangements of Si atoms with prismtype clusters (referred to as superclusters) in the slowest-cooled and fastest-cooled samples are depicted in Fig. 4d Fig. 3f and are frequently observed, especially in the slowest-cooled sample (Fig. 4d). In contrast, in the fastestcooled sample, the number of units consisting of several prisms (extracted from the dark red region ([2.01, 2.05] birth × [3.07, 3.11] death ) in the reconstructed persistence diagram of Fig. 3f), is much larger (5-9 prisms), as shown in Fig. 4e. We also calculated the histograms of the numbers of atoms constituting the superclusters for both samples, as shown in Fig. 4f. The distributions were statistically analysed for 460 and 358 atomic configurations in the slowest-and fastest-cooled samples, respectively. The extent of superclusters for the fastest-cooled sample is much larger than that for the slowest-cooled sample, indicating that slower cooling leads to the formation of compact units of prism-type clusters. The structural features of the superclusters will be discussed later.
Conventional pair distribution function and Voronoi polyhedral analyses. The Pd-Pd, Pd-Si and Si-Si partial PDFs of molecular dynamics models formed with cooling rates of 1.7 × 10 10 , 1.7 × 10 11 , 1.7 × 10 12 , 1.7 × 10 13 and 1.7 × 10 14 K s −1 are shown in Supplementary Fig. S2. The profiles for Pd-Pd and Pd-Si pairs do not show any clear differences, whereas subtle changes can be observed in the profiles of the Si-Si pair. This implies that the shorter-range structures remain largely unchanged with respect to pair correlation, although some differences in distortion were detected in the present persistence diagram analyses (Fig. 4a, b). The subtle change in the Si-Si partial PDFs should be related to the drastic rearrangement of the prism-type clusters (Fig. 4c, d). The Voronoi polyhedral analyses were also performed for molecular dynamics models formed with the five different cooling rates ( Supplementary Fig. S3). Although there are some subtle differences due to cooling rates, it was hard to find systematic structural changes from the lists of Voronoi indices and also extract extended range structural changes beyond the nearest neighbour environment. Thus, our persistence diagram analyses have a higher sensitivity in detecting structural changes in disordered structures compared to conventional PDF and Voronoi polyhedral analyses.

Discussion
To understand the extracted local structures shown in Fig. 4, we calculate "local" structure factors S(Q) for those structures and compare them with the experimental S(Q) curve obtained by neutron scattering 23 , as shown in Fig. 5. The experimental S(Q) curve possesses a characteristic second peak splitting, which is generally recognized as a feature of glass structures. The local S(Q) curves obtained from the superclusters (3-4 prisms) frequently found in the slowest-cooled model also exhibit similar second peak splitting (Fig. 5a), whereas the curves of the local structures (5-9 prisms) in the fastest-cooled model do not (Fig. 5b). This fact implies that the superclusters formed in the slowest-cooled model largely contribute to the experimentally observable structural feature of the glass state. In other words, the formation of the superclusters presumably induces the development of a glassy structural order. Although the cooling rate for real glass is beyond the range of molecular dynamics simulations, it is expected that the volume fraction of the superclusters will increase by further decreasing the cooling rate.
As mentioned above, the persistence diagram/PCA analyses enable us to extract changes in the static local structures, particularly those related to the cooling rate. Beyond the local structure changes estimated from this work, there could be a hypothetical ideal glass that corresponds to an inherent structure in the energy landscape description. By obtaining a thorough dataset, it might be possible to predict the plot positions generated by the ideal glass structure on the persistence diagram by LRA. It is presumed that the ideal glass structure is mostly filled with unique non-crystalline local structures without any longrange order. A candidate of such local structures could be a triangular configuration of vertex-sharing less-distorted prisms (Fig. 4c), as found mainly in the slowly cooled models in the present work, which are different from other related crystal structures such as Pd 9 Si 2 and Pd 3 Si (Supplementary Fig. S5).
We have combined a persistent homology analysis with machine learning techniques such as PCA and LRA, to analyse the cooling-rate dependence of the atomic structural order of Pd-Si metallic glasses. Persistent homology analysis enables us to survey the three-dimensional structural features from short-to extended-ranges based on the distributions visualised via persistence diagrams. The machine learning techniques were able to deduce the core local atomic structures that are strongly correlated to the cooling rates of liquid-to-glass transitions. In other words, from the analyses, it is possible to extract the essential features in the static structures during glass formation. In the present case, we found a drastic change in the extended range atomic structure, which corresponds to the spatial arrangement of  Fig. 4d, e, respectively. The experimental neutron S(Q) is replotted from ref. 23 . the prism-type clusters, rather than to a change in individual atomic clusters during glass formation. In particular, the extended range atomic structures with 5-9 atomic clusters leading to larger holes in the persistence diagrams of Si atoms are dominant in the sample cooled rapidly, whereas those with 3-4 atomic clusters (superclusters) leading to smaller holes become more prominent in the sample cooled more slowly. It should be noted that the latter superclusters largely contribute to the second peak splitting in S(Q), which is known as a feature of glass structures, thereby inducing a glassy structural order. In the future, we will apply the present techniques to different types of glass in order to gain a deeper insight into general trends.

Methods
Molecular dynamics simulation. Models of amorphous Pd 80 Si 20 alloys were constructed using molecular dynamics methods. The embedded atom method potentials developed by Sheng (https://sites.google.com/site/eampotentials/) were employed for the simulations. Molecular dynamics simulations were performed under constant isothermic-isobaric (NPT) conditions using a LAMMPS Molecular Dynamics Simulator 24 . First, 9600 Pd and 2400 Si atoms were prepared randomly in cubic molecular dynamics cells with periodic boundary conditions. The initial configurations were kept at 2500 K for more than 50 ps and then cooled to 300 K at different cooling rates (1.7 × 10 10 -1.7 × 10 14 K s −1 ). For stabilisation, the cooled models were further kept at 300 K for 125 ps. The system temperature and pressure were controlled using a Nose-Hoover thermostat and a Nose-Hoover barostat, respectively.
Computational homology analysis. A unified method proposed by Obayashi et al. 25,26 , which combines a linear machine learning method for persistence images with an inverse analysis method, was adopted in this study. The outline is as follows: 1. Persistence diagrams were computed from molecular dynamics simulation datasets with i-th homology, where i = 1 and 2 for Si and Pd, respectively. 2. The diagrams were converted into vectors using the persistence image, a technique proposed by Adams et al. 27 3. Linear regression analysis and principal component analysis were applied to the vectors. L2 regularisation was used for the regression analysis. As a result, we obtained the optimal coefficients and eigenvalues as dual vectors for each analysis. 4. The persistence diagrams were reconstructed using the learned vectors. The important regions that contain a high contribution of the eigenvalues were drawn on the diagrams. From the diagrams, the birth-death pairs for the characteristic holes or cavities were extracted. 5. The geometric structures were extracted from the original atomic configuration by carrying out explicit inverse analysis of the pairs. Here, we utilised the data analysis software "HomCloud" (https://homcloud. dev/index.en.html) based on persistent homology in order to compute the persistence diagrams, carry out vectorisation of the diagrams, and solve the inverse problem. We also used Python's scikit-learn to implement the linear learning method (http://scikit-learn.org/).
In order to compute the histograms from the persistence diagrams, the targeted regions were clipped into rectangles using [1.2, 2.6] × [1.4, 2.8] and [1.8, 4.0] × [2.1, 4.3] windows and were then divided into 140 × 140 and 110 × 110 boxes for Pd and Si, respectively. The clipped regions were chosen to contain all the cells having values more than zero on the persistence diagrams, and the widths of the grids were chosen such that they reproduced the detailed patterns schematically.
Machine learning. To apply LRA and PCA, we converted the histograms into persistence image vectors. The hyper parameters for vectorisation were taken to be (σ, c, p) = (0.02, 1.0, 1.0) for Pd and (σ, c, p) = (0.04, 1.0, 1.0) for Si. Here, σ indicates the standard deviation and c and p are the parameters of the weight function 26 . We applied Ridge regularisation (L2 regularisation) for the linear regression analysis. The regularisation parameter was set to 200 to simplify the learned result and to avoid over-fitting. We also checked these results using LASSO (L1 regularisation). However, the vectors thus obtained were too sparse to extract a sufficient number of characteristic structures from the original datasets.
The above-mentioned parameters were tuned to not only obtain a better R 2 score for LRA and contribution ratio for PCA, but also to obtain the monophasic peaked reconstructed persistence diagrams from the learned vectors for analysis. In other words, complicated structures of the reconstructed persistence diagrams were not preferred to interpret the result of the inverse analysis. The R 2 score and contribution ratio of 0.509 and 0.754 for the second homology of Pd and 0.549 and 0.675 for the first homology of Si, respectively, were obtained.
Under the above setting, for the final calculations, we defined the regions on the persistence diagrams that contributed the most to the cooling rate, namely, [ and 358 rings as the structures contributing most to the slowest and fastest cooling rates, respectively, for Si, and 710 and 435 cavities, respectively, for Pd.

Data availability
All datasets are available from the corresponding author upon reasonable request.