Taming nucleon density distributions with deep neural network method

We investigate the density distributions of finite nuclei employing a well-designed deep neural network method. We calculate the target nucleon density distributions with Skyrme density functional theories, which are used to train the networks. We find that the training with only about $10\%$ nuclei ($300-400$) is sufficient to describe the nucleon density distributions of all the nuclear chart within 2\% relative error. The relative error comes to 5\% when about 200 proton(neutron) density distributions are used for training. We obtained very similar results for different Skyrme density functional theories. Therefore the ability to train networks is weakly dependent on the theoretical model. Moreover, in the process of machine learning, there is a turning point showing the transition from the Fermi-like distribution to the realistic Skyrme distribution, which provides significant properties of convergence process.


I. INTRODUCTION
Since the discovery of the atomic nucleus [1], density information has played a crucial role in the study of nuclear structure, especially in explaining the nature of nuclear force and the fundamental properties of nuclear matter [2]. For example, in studies of high-momentum tails(HMT) induced by short-range correlations(SRC), it has been found that the percentage of tails is highly dependent on the density [3][4][5]. The asymmetry potential is density-dependent, which is influenced by the Pauli blocking of the ∆-decay and the thickness of the neutron skin [6]. Moreover, The equation of state(EoS) of nuclear matter is used to describe the neutron stars(NS) mergers [7].
In the 1950s, Hofstadter first measured the charge density of protons using electron scattering experiments and described the density distribution of some nuclei on that basis [8]. Whereafter, two-parameter Fermi (2pF) / threeparameter Fermi (3pF) distributions [9][10][11] or Fourier-Bessel expansions [11,12] have been used to roughly describe the shape of nuclear density profile. The density functional theory (DFTs), e.g., Skyrme-Hartree-Fock (SHF) method, is one of the most efficient and widely used models to study the bulk properties of nuclei, especially the density distribution [13][14][15]. With the creation of the Bardeen-Cooper-Schrieffer(BCS) theory [16], the understanding of superconductivity theory has been subverted and new blood has been injected into nuclear physics. Bogoliubov imported the generalized Bogoliubov transform to include both the particle-hole and particle-particle parts of nuclear force [17], which, of course, made for a more refined description of density. Currently, almost all models describing nuclear structure can calculate density distributions. In 2016, density in- * yangzuxing16@impcas.ac.cn † zuowei@impcas.ac.cn formation for the short-lived nucleus 34 Si was also obtained by comparing the experimental measurements for the removal of an l = 0 proton with the results of reaction theories [18]. Some simulations, based on transport models, also have a positive effect on the discovery of density profile. For example, using density profile as a initialization-condition, the bubble structure was researched by heavy ion collisions(HIC) simulation of p+ 48 Si [19]. Back propagation neural network (BPNN) method [20][21][22][23][24][25][26][27], which is the most popular and powerful machine learning tools, has been remarkablly developed in computer vision (CV) [28,29] and natural language processing (NLP) [30,31] in recent years. In the application to the nuclear physics, it is also becoming prevailing. By training neural networks with the X-ray radiation from neutron stars, the relationship between pressure and mass density has been estimated [32]. Combining the Hartree-Fock-Bogoliubov(HFB) and multilayer neural network methods, the nuclear complexity was tamed, where excited state energy, nuclear deformation etc. of nuclei are predicted [33]. Excellent results have been obtained using Bayesian neural network (BNN) methods for the extrapolation of binding energies from stable nuclei to drip-line region [34]. The artificial neural network method was used to extrapolate the ground state properties of nuclei calculated by ab-initio approach to large basis space [35,36]. Until recently, Google proposed a hybrid quantum-classical machine learning model for training beyond classical data types [37]. Inspired by these successful applications of the machine learning method, we briefly modify the traditional DNN model by using the determined mass number as a physical condition. neural network method by training networks with density distributions of very limited nuclei. This is a logistic regression problem in supervised learning, which statistically can be seen as a maximum likelihood estimation (MLE) [38], where w and w MLE denote the weights of neural networks during learning and the weights of neural networks where the mapping is implemented with the greatest probability, respectively. x i and y i represent the input and output, respectively. In this work, the input is {Z num , N num , τ } where Z num and N num denote the proton and neutron numbers, respectively. τ corresponds to the proton (τ = 1) or neutron (τ = 2). The output is the nucleon (τ = 1 for proton and τ = 2 for neutron) density distribution of the nucleus (Z num , N num ). That is, the neural network is to produce the proton or neutron density distribution of a nucleus for a given set of {Z num , N num , τ } . BPNN, which was proposed by Rumelhart and McClelland in 1986, is adopted to achieve Eq.(1). The method adjusts multi-layer feedforward neural network with back propagation algorithm of error, by which the weight of the link chain converges toward its stated goal w MLE .
network structure Neural networks typically contain a large number of interrelated neurons in the input layer, output layer, and several hidden layers. We used about 860,000 parameters located in eight layers (l ∈ [1,8]), where a are weight and bias of the l-th layer. In the input layer and each hidden layer(l ∈ [1,7]), the nonlinear activation function g(x) = ReLU(x) = max(0, x), which has been shown to have extraordinary effects [39,40], is employed. In the output layer(l = 8), the activation function g(x) = Sigmoid(x) = 1 1+e −x is adopted to smooth the results.
The loss function is a key component in network regulation. The mean squared error (MSE) is widely used for regression problems, which can be obtained naturally by taking a normal distribution for P (D|w) in eq.(1). However, a physical constraint ∞ 0 4πρ(r)r 2 dr = X has to be taken into account in this work, with X = Z num or N num . ρ(r) denotes the corresponding nucleon density distribution. We therefore introduce a correction to the MSE and define the normalized mean square error(NMSE) as follows, where α runs over all the nuclei which are adopted to train the networks. M corresponds to the number of nuclei in a training set. i runs over all the radial grids for a specified nucleus α. In this work we consider the radial range from r = 0 fm to r = 15 fm and divide r into N = 150 grids for all the nuclei. ρ pre = a (8) i refers to the nucleon density distribution of a nucleus obtained with the current neural network method during the learning process, and ρ tar denotes the target nucleon density distribution of the same nucleus, which is calculated with the Skyrme DFTs. With the correction in Eq. (3), absolute density becomes irrelevant. The conserved quantity allows our networks to be more targeted in capturing scale Features. The networks hence can be well-converged even with very limited training dataset. Combining Eq.(3) with Eq.(1), mapping out the density distributions of a set of nuclei for a training set is equivalent to solving the following equation, During the machine learning, we employ the adaptive moment estimation(Adam) [41] for the optimizer to operate gradient descent. We manipulate the learning rate according to the variation in error at different epochs. We take the Keras with a Tensorflow backend [42] to train the networks. The training results become well converged within 2000 epochs. One epoch represents that all the neurons have been trained once. training process About 3400 nuclei have been discovered in laboratories to date. Various theoretical approaches have been used to calculate the nucleon density distributions of these nuclei. In this section, the target nucleon density distributions are obtained by the SHF+BCS approxiamation with the SKM* interaction, which we refer to as SKM*-SHF. For a given training set, we take a fraction of the nucleon density distributions of these nuclei to train the neural networks. We calculate density using the following equation [43]: where w β and R β denote the pairing weight and radial wave function for each single particle state, respectively. The total wave function reads: The functions Y j β l β m β (θ, φ) are spinor spherical harmonics.
In our first application, we train the networks with 10% of the entire dataset, formed by about 6800 sets of neutron/proton density distributions of about 3400 nuclei. We present the locations of corresponding nuclei in Fig. 1. Note that we choose these 10% of the dataset randomly. Therefore both the proton and neutron density distributions of some nuclei (blue symbols in Fig. 1) are adopted in these 10% data, whereas only one of the two nucleon density distributions [either the proton (yellow symbols in Fig. 1) or neutron density distribution (blue symbols in Fig. 1)] is employed for other adopted nuclei. The red symbols in Fig. 1 corresponds to the nuclei which are not used in the training process.
In Fig.2 we present the loss function NMSE and MSE as functions of epoch, for both the training set and the validation set. For the training set, we use 10% of the entire dataset to train the networks as shown in Fig. 1. We found that the NMSE and MSE decreases with epoch simultaneously during the training process, which indicating that the results tend to be converged with networks trained sufficiently. Meanwhile, another 10% of entire dataset was evaluated after each training epoch as the validation test. It is clear that, both for NMSE and MSE, there is a high consistency of errors on training sets and validation sets. This shows that no under-fitting or over-fitting during the training process, which illustrates that the neural network method is able to describe the nucleon density distributions of the entire nuclear chart by training the networks with only 10% of the dataset. Note that NMSEs are much smaller than MSEs in Fig.2 since we train the networks with the NMSE other than the MSE. We therefore treat MSE as an amplifier for NMSE. With the help of this amplifier, we notice that the two significant convergence processes emerge during the training process: the first one is at about 50 epochs and the second one is from 500 to 700 epochs.
What do the two convergences represent? In Fig.3 we show the evolution of the neutron density distribution of a random nuclide 125 In on the test set during the training process in Fig.2. The first significant convergence brings the density distribution from chaos to order, which is achieved rapidly within around 50 epochs. The red block at 100th epoch, which is Fermi-like, is an exemplary distribution after the first significant convergence. The second significant convergence emerges from about 500th epoch, after which the Fermi-like distribution evolves to that calculated by the SKM*-SHF method. In order to make the value of the loss function as small as possible, the network naturally gets the Fermi-like distribution, which is like the process of human understanding of nuclear density profile in the past 70 years, from naive to refined. Epoch 500 is a crossover point, which is the starting point where the loss function falls again after a long period of steady state. We therefore refer to the 500th epoch as a wisdom-point (W-point).
Regarding the W-point, here are some interesting conclusions: i. the W-point appears earlier (later) as the amount of training data increases (decreases); ii. The W-point does not appear if the training data is too small; iii. The appearance of the W-point is due to the particle number condition we added to the loss function.
After the second convergence, the model is still finetuned. Until the 1400th epoch, the distribution is almost the same as the target distribution, which implies that the neural network method is rather useful for the current application. We retain the neural network within 2000 epochs where the validation set error is minimum.

III. RESULTS AND DISCUSSIONS
In the following applications, we use two Skyrme DFTs to calculate the target nucleon density distributions in order to investigate the dependence of the neural network predictions on the theoretical model. One of the Skyrme DFTs is the SKM*-SHF as we mentioned in the above section. The other one is the Hartree-Fock-Bogoliubov method using the SLy4 interaction, which we refer to as SLy4-HFB. As for the SKM*-SHF model, nine neural network models with 2%, 5% and 10%-70% of the dataset adopted to train the networks are used to investigate the convergence pattern. However, we take 10% of the dataset, which are all even-even nuclei, to train the networks for the SLy4-HFB model. Moreover, we use identical inputs for the training processes with SKM*-SHF and SLy4-HFB to guarantee their consistency.
We present in Fig.4 the error distribution obtained with the nine neural network models based on the SKM*-SHF data training. The error distribution is the standard deviation between the result of the neural network method, λρ pre (r) i and the result obtained by the SKM*-SHF method, ρ tar (r) i , and is written as, The subscript i runs over all 100 nucleon density distributions (A = 100) which are randomly selected. We find in Fig.4 that the maximum value of the error distribution is within 0.0015 fm −3 for the train set with 10% dataset.
Comparing with the saturation density of the proton or neutron, about 0.08 fm −3 ) (half of the nuclear saturation density, 0.16 fm −3 ), the central relative error (CRE) is less than 2%. For the train set with less than 10% dataset, we can see the error increases dramatically. For the case with 5% dataset, the predicted CRE is between 5% and 6%. However, even with only 2% of the dataset used for training, the predicted CRE is less than 10%. The predicted CRE reduces as more data are included in the train set. Fig.5 shows four random proton density distributions and four random neutron density distributions predicted by the trained neutral-nets with different fractions-2%, 5%, 10% and 20% of dataset. It is clear that the results predicted by the models trained with 10% and 20% of the dataset overlap almost exactly with the curves obtained from SKM*-SHF calculations. Of particular note is the green curve, which is almost a standard Fermi distribution. There was no second convergence during the training for this curve, i.e. the W-point did not occur. This is acceptable because the training data at this point is rather small. This curve corresponds to a CRE of approximately 10%. This value can also be considered as the CRE at the W-point during the training with more than 2% of dataset. Naturally, the corresponding error curve(the 2% one) in Fig.4 can also be viewed as the error curve at the W-point. The predictions of the model trained with 5% of the dataset are intermediate between the 2% one and the 10% one. It has undergone the second significant convergence in part and provides an error of only about 5%.
In Fig.6 we present a quantity σ tot , which represents the normalized total standard deviation and is written as (8), where N denotes the number of radial grids. For the results in the right part of Fig.6, σ tot decreases as the amount of training data increase, which is consistent with the results in Fig.4. In the left part of Fig.6 we investigate the model dependence of the current neural network method. We randomly selected 340 nucleon density distributions of eveneven nuclei to train the networks using the target nucleon density distributions calculated by the SKM*-SHF and SLy4-HFB methods. We notice that σ tot for the results obtained with the SKM*-SHF and SLy4-HFB methods are on the same order of magnitude. Therefore the predictive power of the current neural network method is weakly dependent on the theoretical models and interactions. Comparing the data for 10% of dataset in the left part of Fig.6 with that in the right part, we find that σ tot for even-even nuclei is smaller than the result which does not distinguish the odd/even status of the proton or neutron numbers. This can be explained in terms of the complexity of information. Dataset with only eveneven nuclei have lower complexity than the other case. During the training process, low complexity contains less features, which are easier to captured. Therefore, the neural network calculations with only even-even nuclei dataset lead to smaller σ tot .

IV. SUMMARY
In this work we proposed a method to describe the nucleon density distributions of finite nuclei using neural network method. Due to the normalized condition in the loss function, the nucleon density distribution was always Fermi-like in the early stage of the machine learning and then tended to the target distribution after the second significant convergence. We therefore defined a W-point at which the second significant convergence emerged. We found that the W-point emerged earlier with more data adopted in the training process. We did not observe the W-point for the machine learning with 2% of dataset.
We found that the machine learning with 10% of dataset was sufficient to predict the nucleon density distributions of all the nuclei with high precision. The corresponding CRE was less than 2%, which suggested that the neural network method adopted in this work was rather successful in the current application. By comparing the results using the target distributions calculated with two different Skyrme DFTs, SKM*-SHF and SLy4-HFB, we found that the predictive power of the neural network was weakly dependent on the adopted theoretical model.
So far, the ground-state density distribution of spherical nuclei has been adequately studied. Compared to traditional methods, the neural network method not only re-duces the complexity of the study, allowing one to avoide complicated many-body problem, but also significantly improves the predictive power with less computational cost.