A Kohn-Sham Scheme Based Neural Network for Nuclear Systems

A Kohn-Sham scheme based multi-task neural network is elaborated for the supervised learning of nuclear shell evolution. The training set is composed of the single-particle wave functions and occupation probabilities of 320 nuclei, calculated by the Skyrme density functional theory. It is found that the deduced density distributions, momentum distributions, and charge radii are in good agreements with the benchmarking results for the untrained nuclei. In particular, accomplishing shell evolution leads to a remarkable improvement in the extrapolation of nuclear density. After a further charge-radius-based calibration, the network evolves a stronger predictive capability. This opens the possibility to infer correlations among observables by combining experimental data for nuclear complex systems.

Introduction-The last decade has seen the independent machine-learning studies on different nuclear observables to meet the experimental values, such as nuclear masses [1][2][3], charge radii [4][5][6][7], excited states [8,9], α-decay half-lives [10], β-decay half-lives [11], fission yields [12,13], and etc. Traditionally, these basic nuclear properties are evaluated and discussed using various theoretical approaches with different interactions, amongst which the density functional theory, based on the Hohenberg-Kohn theorem, has been widely recognized due to the accurate and universal calculations. On this basis, many branching methods have been developed to cope with various behaviors of nuclei, such as introducing collective Hamiltonian to handle collective vibrations and rotations [14], combining random phase approximation methods to describe giant resonances [15][16][17], and using Woods-Saxon basis expansions to explain halos [18,19]. Nevertheless, the theoretical high computational cost results in difficulty in finding the universal functional. Amidst the researches on density functionals, attention has also been paid to the long-standing structure problem of the correlations among different physical quantities, where the recent breakthroughs have occurred in the physics of electronic systems and condensed matter by combining machine learning techniques. It is significative that artificial intelligence has reconstructed the Hohenberg-Kohn map of density functional theory [20], which is the bijection between the local density and the ground-state many-body wave function. Meanwhile, the self-consistent charge densities calculated with different exchange-correlation functionals were used to make predictions for the correlation, exchange, external, kinetic, and total energies simultaneously by an extensive deep neural network [21].
From a fundamental perspective, the effectiveness of density functionals in one respect is a result of the fact that Kohn and Sham introduced explicitly noninteracting particle systems to calculate shell evolution including exchange and correlation effects [22], which is instrumental in consolidating intrinsic connections of observables. Such the connections have been effectively exploited in our previous work [23], in which the influences on densities and binding energies were seen with the charge radius residuals between the experimental and theoretical values eliminated by a series of neural networks.
Inspired by these studies, we are aware that accessing different physical quantities via a single neural network can provide an opportunity to analyze nuclear properties physically and propagate residuals accurately. In this work, we will construct a Kohn-Sham scheme based multi-task neural network to straightforwardly generate nuclear auxiliary single-particle states for characterizing density functionals, which we will note as Kohn-Sham network (KSN).
Kohn-Sham network-To achieve a high-precision neural network, the single-particle wave functions and occupation probabilities are first computed using the Skyrme Hartree-Fock (SHF) approach with SkM* interaction [24] including the Bardeen-Cooper-Schrieffer (BCS) pairing. The Kohn-Sham equation reads where h kin , h u , and h ls denote the kinetic, potential, and spin-orbit terms, respectively. The ϕ i and i correspond the single-particle wave function and energy of state S i . The occupation probability w i is obtained by solving the BCS equation. The spherical symmetry is adopted in the present calculations. In particular, the conservation of particle number should be obeyed during the reconstruction, where d i is the degeneracy of state S i and A τ is proton number Z (τ = p) or neutron number N (τ = n). occur in how to encapsulate these properties as well as the BCS equation in a neural network. Facing these difficulties, we note the advantage of taking √ w i ϕ i (r) as the predictions, which allows us to consider ϕ i as normalized and exempts us from modelling nonlinear changes in the pairing effect. In this case, one only needs to be rigorous about particle number conservation and orthogonality among different single-particle states. The structure of KSN designed based on the above discussions is shown in Fig. 1. The input of the network is x = {Z, N } and the output is a matrix with the element √ w i ϕ i (r g ) (i = 1, 2, ..., 30) with r g = 0.1 fm × g (g = 0, 1, ..., 149). In detail, a multi-task fully connected (FC) neural network is selected, where the FC layer is written as with a k being the trainable weight and bias, and C l corresponding the number of neurons. The nonlinear activation functions g(x) = ReLU(x) and g(x) = tanh(x) are employed for different layers. Firstly, {Z, N } is fed into a five-layer FC neural network trunk cell (C1) to produce latent features. Subsequently, the latent features are entered into each of the 30 FC neural network branch cells with the same structure (C2), corresponding to the 30 single-particle states, for which the typical shell ordering according to the nuclear oscillator shell model is taken for the state S i listing in Supplemental Materials. Obviously, the latent features contain the associations among different shells. This process is recorded as In practice, we train on Eq. (4) as the first step with the objective function being where y i = √ w i,tar ϕ i,tar (r) is the target gained by SHF+BCS and degeneracy d i weights each shell. The feasibility and generalizability of this training has been discussed in Ref. [25]. After 1500 epochs (about 20 GPU minutes), the trainable parameters of the pre-trained model are determined initially. Considering the conservation of particle number (Eq. (2)) and the normalized single-particle wave function, the excess nucleon number ∆, as a small value due to the activation function tanh(x) in C2 and the pretrained model, is superimposed on the outermost shell according to typical shell ordering, which is referred to as normalization denoted by In implementation, the normalization operator and the corresponding back-propagation are achieved with the help of Autograd in Pytorch [26]. The pre-trained parameters are loaded into the calibrated network as the second step for fine-tuning, i.e., another 500 epochs (about 15 GPU minutes) are preformed under the objective function where the (µ∆) 2 punishes mass number deviation with µ = 0.05 balancing the right two terms in magnitude and the factor 1 fm 2 makes Loss 1 and Loss 2 dimensionless. Actually, the normalization substantially complicates the gradient, which would prolong the back-propagation and even cause gradient vanishing. To feasibly operationalize this training, we have to decompose the whole process into the two parts above and vary the learning rate according to the loss value of stochastic gradient descent, where Adaptive Momentum Estimation (Adam) [27] is taken as the optimizer. The carefully modulated hyperparameters sets of the FC neural network cells, the normalization details, and the varying learning rate are listed in Supplemental Materials.
Results-In application, we select only a few nuclei (320 nuclei) as the training set, which are ∼ 10% of the nuclei discovered so far, but the network shows a remarkable generalization capability. Taken the untrained nucleus 136 Xe as an example, the properties of proton single-particle states, densities, and momenta are exhibited in Fig. 2. Panels (a) and (b) show the √ w i ϕ i (r) in coordinate space for each single-particle orbital calcu-

FIG. 2. (Color online) Upper panels: The
√ wiϕi(r) in coordinate space for each single-particle orbital for the untrained nucleus 136 Xe. Middle and lower panels: The corresponding proton densities and momentum distributions as well as the contributions from each orbital. The predictions by KSN are compared with the calculations by SHF+BCS. lated by SHF+BCS and predicted by KSN, respectively. It is obvious that the intrinsic orbital information on theoretical calculations and neural network predictions is indistinguishable by eyes. As a more direct test, we present the density distributions deduced by (c) SHF+BCS and (d) KSN. Each curve is obtained by superimposing orderedly single-particle densities from the state i = 1 to a corresponding state I, The outermost curve indicates the total density of the nucleus and the area between states I − 1 and I represents the contribution of state S I to the total density. Similar to the √ w i ϕ i (r), the total densities and the contributions of each state from SHF+BCS and KSN cannot be distinguished either. In addition, the single-particle wave functions in momentum space derived by Fourier transform are utilized to calculate the momentum distributions and the contributions of each state, displayed in  (e) and (f) with the formula which yields the same conclusion as density. It is clear that the other currents and densities, such as spin-orbit current and kinetic energy density, would have the same quality of outcomes. The plots show that the network can accurately predict all of the wave function components, without displaying significant deviations for magnitude, gradient and integration. Achieving a high degree of accuracy on examples absent from the training set is an indication that the network has not been overfitted, and indeed has captured the underlying physical connotation of Kohn-Sham equation and the auxiliary noninteracting particle systems.
The charge radius R ch is one of the experimental important observables, which is phenomenologically formulated as [28] where R p indicates radius of the proton distribution, the second and third terms are due to the proton size and neutron size, respectively. Note that there are also contributions by the spin-orbit effects to the charge radius [29], while such contributions are relatively small for the nuclei with Z > 40. The calculations for Pb and Sn isotopes are shown in Fig. 3 (a) and (b), respectively, where the shadows mark the nuclei in the training set. Most of the radii calculated by SHF+BCS deviate from the experiment by less than 0.01 fm, which demonstrates that the theoretical calculations are quite accurate. The KSN reaches almost the same accuracy as the theoretical calculations by training only four nuclei within the display region. A step further than traditional computing is that neural networks allow the direct involvement of experiment [23], i.e., single-particle states have the prospect of being calibrated exactly by a large amount of experimental charge radius and binding energy data, which will facilitate the realization of realistic non-parametric Hohenberg-Kohn map for nuclear complex systems. As a further in-depth analysis, the superiority of the KSN extrapolation is illustrated in Fig. 4 by taking the Pb isotopes as examples, where the most neutron-rich trained nucleus is 202 Pb, and we extrapolate to 212 Pb, 222 Pb, and 232 Pb. It is evident from Fig. 4(a)-(f) that the KSN reproduces the theoretical density distributions almost identically both in the linear and logarithmic scales, even in the extrapolation of an additional 30 neutrons. In contrast, the extrapolation of density generator (DG) with simply learning the densities in Ref. [25] appears an obvious deviation in 232 Pb. Neural networks are often considered as an excellent interpolator, but an unreliable extrapolator. Our conclusions demonstrate that the extrapolation performance of networks naturally improves with a reduced degree of freedom as more physical na-tures are learned and more constraints are imposed. The present extrapolation is still far from the drip-line [30], therefore we select randomly 2400 nuclei found in laboratories to extrapolate farther for Pb, which are shown in Supplemental Materials. More trained nuclei make the predictions more robust, while the training time increases linearly.
To make sense of the subtle discrepancies, the neutron occupation probabilities in valence space are shown in Fig. 4(g)-(i), where the short solid lines correspond to the occupation probabilities of single-particle states (3p 1/2 , 1i 13/2 , 1i 11/2 , 2g 9/2 , 2g 7/2 , 4d 5/2 , 4d 3/2 , 5s 1/2 , and 1j 15/2 ) in order from left to right, and the blue vertical line separate the valence space from the rest. The occupation probabilities for protons and deep-bound-state neutrons are neglected due to the completely filled shells. For 212 Pb and 222 Pb, KSN succeeds in describing the occupation probabilities of single-particle states for valence nucleons, both for the ordering and the magnitude. In case of 232 Pb, the states 2g 7/2 , 4d 5/2 deviate significantly from the theoretical calculations, which originates from the increasing excess nucleon number ∆ with extrapolation. The excess nucleon number ∆ = 0.64 is superimposed on 2g 7/2 in accordance with the normalization and the given typical shell ordering. Other than that, it still accurately predicts other states, which ensures that the overall deviation of the density is not too large. We can conclude that shell evolution from the single-particle perspective is indeed encompassed by the network, which is the foundation for the strong extrapolation performance.
Additionally, the orthogonality of the generated singleparticle wave functions is investigated for 212,222,232 Pb. By definition, the angular orthogonality is provided by spinor spherical harmonics, thus only the radial one needs to be discussed. We specify where O ik should be zero in principle as long as i and k belong to P = {i, k|n i = n k , l i = l k , j i = j k }. As an inspection, the root-mean-square of O ik is calculated for each nucleus, where card(P ) is the aggregate of set P . Calcula oretical calculations, we include a calibration of the experimental charge radius in KSN by further considering the objective function as where ∆R ch is the deviation in charge radii between the calibrated-KSN (noted as KSN*) predictions and experimental data. In this case, the objective function will aim to make the smallest possible correction to the theoretical calculations. More than 600 nuclides with Z > 40 are taken as calibration, whereas the Sn isotopes are completely excluded in order to examine the calibration, the results of which are presented in Fig. 5. Evidently, around the stable Sn isotopes with more accurate experimental data, the prediction gains a significant improvement. Meanwhile, it is noticed that heavy Sn isotopes still exhibit a bias toward the theoretical results, which can be attributed to two factors: 1. The network fails to capture the necessary calibration information from the training set, which may be caused by the experimental precision; 2. the theoretical calculations still carry considerable weight in calibration. We conclude that the calibrated network has a stronger prediction capability even for these untrained nuclei, although it still partly relies on the theoretical calculations. In other words, the theoretical calculations and KSN-based calibrations are complementary to each other. The neural network method benefiting from current nuclear structure models has ample potential to tackle some nuclear physics problems, which can in turn be employed to improve current nuclear structure models. The calibrated single-particle wave functions allow for the natural derivation of new spin-orbit and kinetic densities. Then, new nuclear effective interactions could be constructed with a neural network framework, where the back-propagation principle can be leveraged to incorporate experimental binding energies into the functionals efficiently. These functionals can be transferred to current nuclear structure models, which may lead to better descriptions and predictions of nuclear properties.
Conclusion-A novel supervised deep multi-task learning on the nuclear ground-state shell evolution from the Kohn-Sham single-particle perspective has been successfully constructed in a 1D lattice. The carefully designed KSN and its training process have taken the conservation of particle number and the orthonormality of singleparticle wave function into account, whereby the physics features embedded in nuclear density functional can be preserved to a large extent. The success of decomposing the network so as to train it step by step is illuminating for solving the problem of barren plateaus in large complex networks.
With only 320 nuclei trained, KSN generates proton wave function components, density, and momentum for the untrained nucleus 136 Xe that are indistinguishable from the SHF+BCS results, besides achieving a high accuracy description for charge radii of Pb and Sn isotopes. Succeeding basically in portraying the densities and occupation probabilities of valence nucleons for 212,222,232 Pb serves as a proof of reliable extrapolation performance to KSN, which significantly outperforms the one with simply learning the densities, demonstrating that extrapolation of networks naturally improves amid paying attention to shell evolution. Our conclusions strengthen evidence that machine learning tools provide a suitable framework to represent density functionals, analyze physical correlation effects, and calibrate theoretical calculations with experiment data. After a further chargeradius-based calibration, the network indeed evolves a stronger predictive capability, which is confirmed by the untrained nuclei, e.g., the Sn isotopes. Furthermore, the calibrated single-particle wave functions and densities would facilitate the researches on finding general non-parametric functionals, which would improve nuclear property descriptions and decrease computational costs.

I. DETAILS OF NEURAL NETWORK STRUCTURE AND MACHINE LEARNING PROCESS
In addition to Fig. 1, the 30 FC neural network branch cells with the same structure (C2) correspond to the 30 single-particle states, where the typical shell ordering according to the nuclear oscillator shell model is taken for the state S i listed in Table S1. Even though the ordering of states within two shell closures may differ from model to model, the objective functions guide the network to learn the correct sequence. TABLE S1. Typical shell ordering according to the nuclear oscillator shell model for protons as well as for neutrons. The "degeneracy" is the maximum number of nucleons in the given single-particle state. For the normalization module, considering the conservation of particle number and the normalized single-particle wave functions, the excess nucleon number ∆ is superimposed on the outermost shell according to typical shell ordering denoted by (S1) In implementation, the normalization operator is achieved via the following three steps.
1. Extracting and activating w i : The ϕ i is considered as normalized so that occupancy probability w i can be extracted by whose value domain is truncated to be [0, 1] by a novel activation function 2. Calculating the excess nucleon number ∆ : Due to the limitations of the activation function tanh(x) in C2 and the pre-trained process, the excess nucleon number ∆ would be a small value noted as 3. Calibrating occupancy probabilities: As a simple treatment, the ∆ is added to the outermost shell I according to typical shell ordering (see Table S1), i.e., I satisfies and the final w i that strictly obeys the conservation of particle number is denoted as To feasibly operationalize this training, we have to decompose the whole process into the two parts and vary the learning rate (lr) according to the loss value. The carefully modulated hyperparameter sets of the fully connected (FC) neural network cells and the varying learning rate are listed in Table S2.
In application, the random 320 nuclei shown in Fig. S1 are adopted to train Kohn-Sham network (KSN), where the trained nuclei are marked by red symbols and the other nuclei discovered in laboratories are indicated by grey symbols.

II. DETAILED INFORMATION IN THEORETICAL CALCULATIONS
In this section, the detailed parameters of theoretical calculations are presented in Table S3. We employed the program HAFOMN [2] to calculate the nuclear ground-state wave functions in the Skyrme Hartree-Fock framework including the Bardeen-Cooper-Schrieffer pairing (SHF+BCS). The parameters in the table correspond to the Skyrme density functional we used, SkM* [3]. All other parameters not listed are set to their default values in the program.

III. EXTRAPOLATION RESULTS FOR PB ISOTOPES
In this section, in addition to Fig. 4, we select randomly the 2400 nuclei found in laboratories as training and extrapolate farther for Pb. Figure S2 (a)-(j) show the nuclear densities predicted by KSN and density generator (DG) [4] for the nuclei 212 Pb, 222 Pb, 232 Pb, 242 Pb, and 252 Pb in linear and logarithmic scales. Meanwhile, the corresponding deviations ∆ρ between the predictions by neural networks (KSN/DG) and the calculations by SHF+BCS are presented in (k)-(o). Figure S2 displays that an obvious deviation appears in 232 Pb for DG, while for KSN, no deviation appears until 242 Pb. Consistent with Fig. 4, the predictions of KSN are much better than those of DG amid the increasing deviation with extrapolation, which again proves the key conclusion in the main text, i.e., neural networks extrapolation naturally improves as more physical properties are learned and more constraints are imposed.
As a comparison, the nuclear radii R for the Pb isotopes predicted by KSN and DG are shown in Fig. S3 with (a) 320 nuclei trained and (b) 2400 nuclei trained. Remarkably, the extrapolations of KSN in nuclear radii are more steady and effective than those of DG, where the KSN deviates not large until 242 Pb but DG fails in the case of 232 Pb. As the shell evolution is achieved, KSN obtains more accurate predictions based on less data, which clearly illustrates the importance of physics in a data-driven approach.