Nuclear masses in extended kernel ridge regression with odd-even effects

The kernel ridge regression (KRR) approach is extended to include the odd-even effects in nuclear mass predictions by remodulating the kernel function without introducing new weight parameters and inputs in the training network. By taking the WS4 mass model as an example, the mass for each nucleus in the nuclear chart is predicted with the extended KRR network, which is trained with the mass model residuals, i.e., deviations between experimental and calculated masses, of other nuclei with known masses. The resultant root-mean-square mass deviation from the available experimental data for the 2353 nuclei with $Z\ge8$ and $N\ge8$ can be reduced to 128 keV, which provides the most precise mass model from machine learning approaches so far. Moreover, the extended KRR approach can avoid the risk of worsening the mass predictions for nuclei at large extrapolation distances, and meanwhile, it provides a smooth extrapolation behavior with respect to the odd and even extrapolation distances.


Introduction
Nuclear mass is of fundamental importance not only for various aspects of nuclear physics [1], but also for astrophysics [2,3]. It can be used to extract a lot of nuclear structure information, e.g., nuclear deformation [4,5], shell structure [6,7], and effective interactions [8,9,10]. It is also a key nuclear physics input in understanding the energy production in stars [11] and the origin of elements in the universe [2] by determining the reaction energies of all involved nuclear reactions.
Great achievements in nuclear mass measurements have been made during recent decades with the development of radioactive ion beam (RIB) facilities, and about 2500 nuclear masses have been measured to date [12,13]. Nevertheless, the masses of a large number of neutron-rich nuclei involved in the r-process remain unknown from experiments and cannot be measured even with the nextgeneration RIB facilities. The local mass relations, such as the isobaric multiplet mass equation (IMME) [14], the Garvey-Kelson (GK) relations [15], and the residual protonneutron interactions [16], can be used to predict the masses of nuclei very close to the experimentally known region, but they are not sufficient for the demands of r-process simulations. Therefore, theoretical predictions for nuclear masses are imperative at the present time. It can be traced back to the von Weizsäcker mass formula based on the famous liquid drop model (LDM) [17]. Tremendous efforts have been made in pursuing different possible extensions of the LDM, which are known as the macroscopicmicroscopic models, such as the finite-range droplet model (FRDM) [18] and the Weizsäcker-Skyrme (WS) model [19]. The microscopic mass models based on the nonrelativistic and relativistic density functional theories (DFTs) have also been developed (see e.g., Refs. [20,21,22,23,24,25,26] and references therein). They are usually believed to have a better reliability of extrapolation [27], although their precisions of predicting experimentally known masses are currently poorer than the macroscopic-microscopic models.
The root-mean-square (rms) deviations between theoretical mass models and the available experimental data [12,13] range from about 3 MeV for the BW model [28] to about 300 keV for the WS ones [19], which are still not enough for accurate studies of exotic nuclear structure and astrophysical nucleosynthesis. In particular, for neutronrich nuclei far away from the experimentally known region, the differences among the predictions of different mass models can be as large as several tens MeV. In recent years, machine learning approaches have been employed to further improve the accuracies of nuclear models, such as the radial basis function (RBF) approach [29,30,31,32,33,34], the Bayesian neural network (BNN) approach [35,36,37,38], and the kernel ridge regression (KRR) approach [39]. By training the machine learning network with the mass model residuals, i.e., deviations between experimental and calculated masses, machine learning approaches can reduce the corresponding rms deviations to about 200 keV. However, the RBF and the BNN approaches predict quite different masses for nuclei far away from the region with known masses [40]. This indicates that the extrapolation abilities of these two machine learning tools have not been properly understood yet. In contrast, the KRR approach with the Gaussian kernel can automatically identify the limit of the extrapolation distance and avoid the risk of worsening the mass descriptions for nuclei at large extrapolation distances [39].
The rms deviations can be further reduced with more effects being taken into account, e.g., the odd-even effects, pairing effects, and shell effects. The odd-even effects are included in the RBF approach by building and training additional networks [32], while the pairing and shell effects are included in the BNN approach [37] by including additional inputs in the network. The numbers of weight parameters are significantly increased in comparison with the original networks in both approaches. The accuracies of the mass predictions for known nuclei are further improved, but the RBF and the BNN approaches still predict quite different masses for nuclei far away from the known region [40].
In the present work, the KRR approach is extended to include the odd-even effects for nuclear mass predictions. The odd-even effects are included only by remodulating the KRR kernel function. Therefore, the number of the weight parameters is not increased, and the inputs of the network remain in the present extended KRR approach. The hyperparameters in the extended KRR network are optimized by careful cross-validations. The performance and reliability of the extrapolated mass predictions are analyzed in detail.

Theoretical framework
The KRR approach is a powerful machine learning approach for nonlinear regression and has been successfully applied in nuclear mass predictions [39]. In order to include the odd-even effects, the KRR function is extended to be where x i ≡ (Z i , N i ) are locations of nuclei in the nuclear chart, m is the number of training nuclei, α i and β i are weights to be determined, K(x j , x i ) and K oe (x j , x i ) are kernels, which measure the similarity between nuclei. The closer two nuclei on the nuclear chart, the larger similarity they have. This is measured by the Gaussian kernel is introduced to enhance the correlations between nuclei that have the same parity of proton and neutron numbers, which reads Here, δ oe (x j , x i ) = 1 for two nuclei have the same parity of proton and neutron numbers, otherwise δ oe (x j , x i ) = 0. The Gaussian widths σ and σ oe measure the length scale on the distance that the kernels affect. The weights α i and β i are determined by minimizing the loss function defined as (4) The first term is the variance between the data y(x i ) and the KRR prediction S(x i ), and the second and third terms are regularizers that penalize large weights to reduce the risk of overfitting. The hyperparameters λ and λ oe determine the regularization strength. Minimizing Eq. (4), we can obtain With Eq. (5), the extended KRR function (1) can be rewritten as a standard KRR function with the remodulating kernel Here, the weights α i are determined by Eq. (6). Note that the number of weight parameters in the present extended KRR approach is the same as the original KRR due to the relation between α and β in Eq. (5).

Numerical details
The extended KRR function (7)  In the present work, the 2353 experimental data of masses M exp for nuclei with Z ≥ 8 and N ≥ 8 are taken from AME2012 [12]. The theoretical masses M th are taken from the mass table WS4 [19], which is one of the most accurate nuclear mass tables.
The leave-one-out cross-validation is adopted to determine the hyperparameters (σ, λ, σ oe , λ oe ). With a given set of hyperparameters (σ, λ, σ oe , λ oe ), the mass prediction for each of the 2353 nuclei can be given with the extended KRR network trained on all other 2352 nuclei. The optimized hyperparameters (σ = 1.25, λ = 0.05, σ oe = 2.65, λ oe = 0.15) are obtained when the rms deviation between experimental and predicted masses of the 2353 nuclei attains its minimum. The mass predictions M EKRR for every nuclei can be calculated by the extended KRR approach with the determined hyperparameters, in which the network weights α i are trained on the set consisting of all other nuclei with experimentally known masses.

Results and discussion
The mass differences between the extended KRR predictions and the experimental data are shown in Fig. 1 for the even-even (e-e), even-odd (e-o), odd-even (o-e), and odd-odd (o-o) nuclei, in comparison with the ones of the WS4 [19] and the KRR predictions [39]. It can be clearly seen that the predictive power of the WS4 mass model is further improved by the extended KRR approach in comparison with the KRR approach. The significant improvement of the extended KRR approach is mainly due to the consideration of the odd-even effects, which eliminates the staggering behavior of mass deviations with respect to the even and odd numbers of nucleons in the KRR approach [39]. Quantitatively, the rms deviation ∆ rms = 298 keV of 2353 nuclei for the WS4 mass model, is reduced to 199 keV by the KRR approach, and is further reduced to 128 keV by the extended KRR approach with the odd-even effects. This indeed provides, so far, the most precise mass model from machine learning approaches.
The mass deviations between the data and the predictions are relatively larger for light nuclei. This may be due to the fact that there are large individual differences among nuclear masses of light nuclei, which are difficult to be reproduced by global mass models. On the other hand, for the applications of nuclear masses in the r-process simulations, the masses of nuclei heavier than iron are important. In Fig. 2(a), the rms deviations ∆ rms between the experimental data and the predictions of the WS4, the KRR, and the extended KRR models for light (Z ≤ 26), heavy (Z > 26), and the whole set of nuclei are shown. The extended KRR approach improves the descriptions of masses for both light and heavy nuclei significantly. Moreover, for the heavy nuclei with Z > 26, the rms deviation between the experimental data and the predictions of the extended KRR model is as small as 100 keV, which is reaching the chaos-related unpredictability limit for the calculation of nuclear masses [41,34]. In Fig. 2(b), it is depicted that the number of nuclei with the corresponding mass deviations from the data ∆M locating in various slots, such as 0 < ∆M ≤ 100 keV, 100 < ∆M ≤ 200 keV, etc. This gives a detailed analyse for the distributions of the mass deviations. For the KRR model, the mass deviations from data are smaller than 200 keV for most nuclei, and the number of these nuclei is 1721, which is about 73% of the whole nuclei set. For the extended KRR model, however, the mass deviations from data are smaller than 100 keV for more than 70% of the nuclei, and they are smaller than 200 keV for more than 90% of the nuclei. This means that there are only 200 nuclei with mass deviations larger than 200 keV, and most of them are light nuclei. To improve the descriptions of masses for these nuclei would be a challenging task for the future.  To examine the extrapolation power of the extended KRR approach for neutron-rich nuclei, similar to Ref. [39], for each isotopic chain with Z > 26, the eight most neutronrich nuclei are removed out from the training set, and they are classified into eight test sets respectively, corresponding to the different extrapolation distances from the remain training set in the neutron direction. In Fig. 3(a), the rms deviations ∆ rms of the calculated masses for the eight test sets from the WS4 mass model, the extended KRR extrapolations, the KRR ones, the RBF ones, and the RBF ones with odd-even effects (RBFoe) [32] with respect to the experimental masses are shown as functions of the extrapolation distance. An even more clear comparison is shown in Fig. 3(b), where the rms deviations ∆ rms are scaled to the corresponding ones for the WS4 mass model. One can see that all the four approaches improve the mass descriptions of nuclei with the extrapolation distances smaller than 4. The corresponding rms deviations are reduced by up to ≈ 40% in comparison with the WS4 mass model. However, one can see obvious distinct features between the KRR type of approaches and the RBF type of approaches at extrapolation distances larger than 4. The rms deviations of the RBF and the RBFoe extrapolations are larger than the ones of the WS4 mass model, and they increase rapidly with the extrapolation distance. For the KRR and the extended KRR extrapolations, the rms deviations increase slowly with the extrapolation distance and are similar or even smaller than the WS4 ones at large extrapolation distances. The distinct features are due to the different behaviors of the Gaussian kernel used in the KRR type of approaches and the Linear kernel used in the RBF type of approaches, which have been discussed in detail in Ref. [39].
Obvious odd-even staggerings along the extrapolation distance can be seen in the rms deviations for the RBF and KRR predictions, where the odd-even effects have not been taken into account explicitly. By considering the odd-even effects with the RBFoe approach, however, the odd-even staggerings remain, and even with an opposite phase with respect to the RBF approach. This means that the RB-Foe approach may overestimate the odd-even effects. On the contrary, for the extended KRR approach, as shown in Fig. 3(b), the rms deviations vary smoothly with the extrapolation distance. This indicates that the odd-even effects have been well handled in the extended KRR approach. Moreover, with the increase of the extrapolation distance, the extended KRR approach converged to the original WS4 mass model. This keeps the key merit of the KRR approach for mass extrapolations, i.e., it avoids worsening the mass descriptions for nuclei at large extrapolation distances.

Summary
In summary, the kernel ridge regression approach has been extended to include the odd-even effects in nuclear mass predictions by remodulating the kernel function. The obtained extended KRR approach does not introduce new weight parameters and inputs in the training network. The resulting rms mass deviation from the experimental data is reduced from 298 keV for the WS4 mass model to 128 keV for the extended KRR approach, which provides the most precise mass model from machine learning approaches so far. For nuclei heavier than iron, in particular, the rms deviation is as small as 100 keV, which is reaching the chaos-related unpredictability limit for the calculation of nuclear masses. Comparative study has been carried out to examine the extrapolation power of the extended KRR approach to the neutron-rich nuclei region. It reveals that the extended KRR approach can avoid the risk of worsening the mass predictions for nuclei at large extrapolation distances, and meanwhile, it provides a smooth extrapolation behavior with respect to the odd and even extrapolation distances.