Quantitative estimation of properties from core-loss spectrum via neural network

Localized structures in nano- and sub-nano-scales strongly affect material properties. Thus, some spectroscopic techniques have been used to characterize local atomic and electronic structures. If material properties can be directly ‘measured’ via spectral observations, the atomic-scale understanding of the material properties would be dramatically facilitated. In this paper, we have attempted to unveil the hidden information about the material properties directly and quantitatively based on core-loss spectra. We predicted six properties, including three geometrical and three chemical bonding properties, by a simple feedforward neural network, and achieved considerably sufficient accuracy. Moreover, we applied the constructed model to the noisy experimental spectrum and could predict the six properties precisely. This successful prediction implies that this method can pave the way for local measurement of the material properties.


Introduction
Local atomic structures and electronic states have crucial effects on material properties. With increasing demand for nanoscale devices, in which peculiar atomic arrangements influence material properties more than those in bulk, the importance of understanding the local atomic structure is rapidly increasing. Therefore, characterizing atomic and electronic structures on local scale, i.e. determining atomic structures and revealing their elements and chemical bonding, is indispensable in modern nanomaterials research.
For nanostructure analysis, core-loss spectroscopy, namely, electron energy loss near-edge structure (ELNES) and x-ray absorption near-edge structure (XANES), has been extensively observed and utilized because the spectral features reflect the atomic and electronic structures [1][2][3][4][5][6], and nano-or sub-nano-scale resolutions have been achieved owing to the development of experimental equipment. It is now possible to investigate light elements [7,8] and distribution of valence states in real space [9,10]. Moreover, the use goes beyond only elemental identification to include analysis of local hybridization of atomic orbitals [11,12], local distortions [13], charge transfer [14,15], and quantification [15,16]. Time-resolved core-loss spectroscopy has also been observed to trace chemical reactions and in situ responses of electronic structures [2,17,18].
Although core-loss spectroscopy is powerful for investigating local atomic and electronic structures, a 'direct' determination of material properties using core-loss spectroscopy would be more attractive to achieve atomic-resolution property measurements. However, the relationship between core-loss features and the properties is hidden and ambiguous, thus extracting it requires expertise and theoretical calculations [3,19]. Therefore, only a few reports have achieved quantification of properties using core-loss spectroscopy [15,16]. If the direct connections between core-loss features and material properties could be revealed, the atomic-scale understanding of material properties at the nano-scale would be dramatically facilitated.
In this study, we developed a new approach to extract local hidden information 'directly and quantitatively' from core-loss spectrum data. We constructed a neural network model to estimate this information as input of core-loss spectra. Furthermore, we applied our constructed model to an experimental spectrum and predicted the local material properties.

Feedforward neural network
A large variety of regression tools, viz. Lasso, support vector machine, random forest, etc, are known in machine learning. Selection of the tool is considerably important because each method has its own advantages and disadvantages. A feedforward neural network, which is the simplest type in a group of neural networks, was selected in this study because it can accommodate large dimensional input data, such as spectra, and incorporate interactions between them in a non-linear manner. The schematic of the constructed neural network for predicting properties from the spectrum is shown in figure 1. Input data consisted of intensities of the core-loss spectrum, which were referred from −1 to 15 eV in increments of 0.1 eV, i.e. 160 dimensional inputs, and output data were the objective properties. The focus was on the properties of individual atomic sites, because the core-loss spectrum can be obtained (calculated) from each site. We used backpropagation, based on the Adam scheme [29], to optimize all the learning parameters in the network, thereby minimizing the mean absolute errors between the model outputs and the training targets. ReLU was used as an activation function, and the dropout rate was fixed at 0.5 in hidden layers not linked with the output layer. Hyperparameters, the number of hidden layers, the number of nodes in the hidden layers, and the regularization parameters, were tuned by fivefold cross validation, using the validation datasets described in section 2.2.

Construction of datasets
In this study, to avoid accidental noise and errors, the spectrum datasets were constructed by simulation. We selected oxygen-K (O-K) edges of silicon oxide polymorphs. This selection is suitable for the present study because of the following three reasons: (1) O-K edge contains much information (sometimes more information than cation edges). (2) O-K edge can be correctly calculated based on a simple density functional theorygeneralized gradient approximation (DFT-GGA) [30]. (3) Many polytypes and compositions are available for Si-O based materials in the database.
We selected 188 different silicon oxides from the Materials Project database [31], some of them having multiple oxygen sites; a total of 1171 O-K edge spectra were calculated. In addition to the spectrum datasets, the dataset of local properties was also prepared. Since the core-loss spectrum was obtained from an atomic site, local geometrical and bonding properties were selected for prediction. Except the geometrical properties, the bonding properties and core-loss spectra were calculated by a first principles plane-wave basis pseudopotential method with CASTEP code [32]. GGA-PBE was selected as the exchange-correlation functional, and the cut off energy was set at 500 eV. To introduce core hole effects when calculating core-loss, on the fly pseudopotential based on CASTEP database was applied to the excited oxygen atom in the supercell. To minimize interactions among excited atoms under periodic boundary conditions, sufficiently large supercells larger than 8 Å were used in all the cases.
All the calculated spectra were aligned with their thresholds, which were set at 0 eV. After that, the spectra were broadened by a Gaussian with the deviation of 0.5 eV. The dataset was randomly shuffled and divided into two subsets, one subset and test data, in the ratio of 9:1, and the subset data was randomly divided into two subsets again, training and validation data, such that the size of validation data was equal to that of test data.

Results and discussion
Six kinds of local properties were selected -average bond length, average bond angle, Voronoi volume, bond overlap population, Mulliken charge and excitation energy. Average bond length, average bond angle and Voronoi volume are geometrical features, the former two provide short range information and the other provides short and middle range information. The bonding properties, bond overlap population, Mulliken charge, and excitation energy are related to the valence states and the core states. Although prediction of excitation energy is not necessary because it is observed accompanying with the experimental spectrum, prediction of the excitation energy solely based on the spectral features was attempted to demonstrate the ability of the present method to predict a core-electron related property.
Using these six properties as the objective valuables, regression analysis was conducted via neural network. The best hyperparameters were determined by grid search. The details are provided in the supplementary information (S1) available online at stacks.iop.org/JPMATER/2/024003/mmedia. The correct and predicted values of training and test data are plotted in figures 2(a)-(f). Gray and colored circles represent training and test data, respectively. The circles on the diagonal gray line mean that the predicted values are equal to the actual values. Since most of the colored circles are on the gray line in figures 2(a)-(f), our neural network model can predict those local properties accurately, indicating that the information about both geometrical and bonding properties is implicitly contained in the spectral features of the core-loss spectrum. Moreover, the prediction model works well not only for bond lengths and bond angles but also for Voronoi volume. Core-loss is often believed to mainly reflect very localized information (such as bond lengths and angles); however, the present results imply that middle range information (such as Voronoi volume) is also included implicitly, and it can be extracted by machine learning.
In addition to geometrical properties, the same argument can also apply to the bonding properties. The neural network can correctly predict all bonding properties (figures 2(d)-(f)). It is notable that the present method can correctly predict the excitation energy 'only' using the spectral features. In core-loss spectra, the spectral features and the excitation energy have been often separately discussed; for instance, the excitation energy, i.e. the chemical shift, is related to the oxidization state, whereas the spectral profile reflects the more detailed fine structure of the partial density of states (PDOS) of the conduction band. However, the present results indicate that the spectral features themselves contain information on excitation energy, and excitation energy can be predicted solely by spectral features. Furthermore, the present method can correctly predict the valence band information at the ground states, viz. the Milliken charge and bond overlap population. The coreloss spectrum is known to reflect the PDOS of the conduction band at the excited state and has no direct relationship with the valence states; however, the present results indicate that spectral features definitely contain information on valence states. By using neural network, geometrical and bonding information were extracted from the spectral features of the core-loss spectrum.
On the other hand, a detailed inspection of figure 2 reveals some circles located away from the diagonal lines (indicated by yellow arrows), indicating that the prediction by neural network has failed for some materials/ sites. On careful examination of those circles, i.e. the spectra, with large errors, it was found that the spectra with large errors for one property tended to have similar large errors for other properties. Specifically, thirteen spectra ranked in the worst ten of more than two properties, and six spectra ranked in more than four properties. The spectra with large errors are pointed out by yellow arrows in figures 2(a)-(f).
On analyzing the data in detail, two causes of misprediction were found: (1) no similar spectrum was included in the training data, and (2) although similar spectra were included in the training data, corresponding properties differed from each other. Both of these causes can be attributed to unique electronic structures, which cannot be learned from the training data. Figure 3 shows the representative cases. The two spectra were obtained from one site of the materials ID=mp-555823 and mp-557076 in the Materials Project database, which are hereafter referred to as Spectrum A and B, respectively. In addition to Spectra A and B (blue line), the three most closely resembling spectra to each one in the training data (gray line) are shown in the same figure. The spectral similarity was measured by Euclidean distance. It is seen that Spectrum A has two characteristic peaks, indicated by yellow arrows, whereas the same characteristic features cannot be found in the reference spectra even though they are judged to be similar to Spectrum A, indicating that Spectrum A is the only spectrum to exhibit these characteristic features in the present datasets. As with the specificity of the spectrum, its atomic structure is also very characteristic. The characteristic features of Spectrum A are related to characteristic Si site which have Si-Si bonding. Such Si-Si bonding is quite rare in crystalline silicon oxides, resulting in no similar features in the training data. As there were no similar spectral features in the datasets, the prediction model could not be constructed correctly for Spectrum A.
In contrast to Spectrum A, Spectrum B was considerably similar to the three closest resembling spectra in Euclidean distance ( figure 3(b)). They commonly have one sharp peak around the threshold. To find the origin of the error, the properties for Spectrum B were compared with those of the closely resembling spectra (table 1). Because the spectra resemble one another, the prediction values (gray shaded rows in table 1) are also close on all four properties, and the predictions for the 1st, 2nd, and 3rd closest spectra work well. However, the prediction for Spectrum B failed because the accurate properties for Spectrum B are different from those of the other spectra, though their spectral features are similar to each other. The discrepancy between the similarity of the spectra and the difference in properties are the origin of this misprediction.
The electronic structure of Spectrum B is actually different from that of the closest spectra. Figure 4 shows PDOS at the ground states of the corresponding materials.    Each PDOS is aligned to the threshold 0 eV. The PDOS of Spectrum B has peaks in lower energy region and it is different from the other three PDOS. However, their core-loss spectra are very similar to each other as shown in figure 3(b). This difference can be ascribed to the difference of core-hole effect. A strong core-hole effect changes the DOS toward lower energy by attracting the positively charged core-hole, and produces a sharp peak in core-loss spectrum [33]. The first peaks of the closest spectra are ascribed to the strong core-hole effect. On the other hand, the PDOS for Spectrum B already has strong intensity at lower energies even at the ground state and does not significantly change the feature upon core-hole introduction. Although their spectral features are similar to each other, their origins are different, in particular, the core-hole effect is strong for the closest spectra and not strong for Spectrum B. To improve the errors caused by the core-hole effect like Spectrum B, we incorporated excitation energy into input data because it is directly correlated with the core-hole effects. We aligned the spectral data according to the excitation energy to incorporate the information on the core-hole effect. Intensities in lower energy region than each excitation energy are set to be 0, namely, zero padding. Figure 5 shows mean absolute errors without/with excitation energy. Four properties, average bond length, average bond angle, Mulliken charge and excitation energy, were dramatically improved. Especially, average bond length and Mulliken charge took approximately 30% reduction of the errors. Furthermore, the errors of Spectrum B were also improved as shown in supplementary (S2). Therefore, we can conclude that considering excitation energy as input data could reduce the errors originating the core-hole effect differences.
Finally, we applied our model to an experimental spectrum. We measured the O-K edge of a clashed αquartz sample. We used an aberration-corrected STEM (JEM-ARM200F, JEOL Ltd) equipped with a monochrometer with 30 meV energy resolution. Since oxygen-K edge is broadened by core-hole lifetime, approximately 0.1 eV, the energy resolution of the instrument can be sufficiently ignored. We retrained the model by the database broaden by Gaussian with 0.1 eV and predicted the properties of the experimental spectrum. Figures 6(a) and (b) show the experimental and theoretical spectra and both of the properties. The experimental spectrum has similar profile to the calculated spectrum, however, it apparently has some noises. Nevertheless, the predicted properties of the experimental spectrum are very similar to the accurate values. Therefore, we can conclude that the model constructed by the simulated spectra was considerably robust to the experimental noises and could predict the properties accurately even from the experimental spectrum.

Conclusion
The extraction of hidden information from core-loss spectra via a feedforward neural network was attempted. As a result, three kinds of geometrical properties, including average bond length, average bond angle and Voronoi volume were predicted. This implies that the spectral features reflect not only short range information but also relatively middle range information. Furthermore, our method could simultaneously predict three kinds of bonding properties, bond overlap population, Mulliken charge, and excitation energy. Although those chemical bonding properties mainly originate from the valence band, the present method can correctly predict this bonding information, indicating that the core-loss spectrum, which reflects the conduction band, contains information pertaining to the valence bands at the ground states.
Furthermore, we also investigated the limitations of the present method by analyzing the mispredicted cases. From the analysis, we found that there were two cases of misprediction in present method: (1) no similar spectrum was present in the training data, and (2) the similar spectra were included in the training data, but the corresponding properties were different. Fortunately, the spectrum calculations can generate the spectra from virtual structures (materials), such as hypothetical crystal structures, resulting in constructing larger databases. We believe that the large databases can correct the errors in the first case. Furthermore, detailed analysis revealed that the second case was attributed to the core-hole effects, and we could correct the misprediction by adding the excitation energy to input data, which is correlated with the core-hole effects.
Despite the limitation, the present method correctly predicted material properties via the spectral features for most cases, and could be applied to the experimental spectrum which had some noises. Therefore, we believe that this method has enormous potential to pave the way for local measurement of material properties.