Machine learning the deuteron

We use machine learning techniques to solve the nuclear two-body bound state problem, the deuteron. We use a simple one-layer, feed-forward neural network to represent the deuteron wave functions of the S and D states in momentum space, and solve the problem variationally using ready-made machine learning tools. We benchmark our results with exact diagonalisation solutions. We find that a network with $6$ hidden nodes can provide a faithful representation of the ground state wave function, with a binding energy that is reproduced to better than 0.1 %. This first proof-of-principle simulation paves the way for a potential solution of the nuclear many-body problem using variational artificial neural network techniques.


Introduction
Machine learning (ML) techniques are ubiquitous within and outside the scientific domain. They are used in a variety of contexts and can be exploited to classify information; to compress it; to interpolate or extrapolate data, and to solve a variety of optimisation problems [1]. In physics, artificial neural networks (ANNs) have been extensively used in the past to analyse data, particularly in particle physics experiments and theory [2,3]. In nuclear physics, early applications of ANNs to nuclear systematics [4,5] have been recently extended to exotic mass domains [6], fission yields [7], α−decay half-lives [8] and nuclear deformation and spectroscopic properties [9]. In nuclear ab initio theory, ANNs are used to extrapolate results of otherwise costly first-principles calculations from restricted model spaces [10,11].
A more recent development of ML techniques is their application to solve specific physics problems in the quantum domain [12,13,14]. In particular, a series of recent ML applications to solve quantum many-body problems form first principles have shown promising results. A pioneering application involved spin systems that used a restricted Boltzmann machine as a wave function ansatz [15]. These simulations gave access to both the ground state and the dynamics of systems with different dimensions, and extensions to excited states have also been formulated [16]. The solution of discrete [17] and real space [18] many-body bosonic systems followed shortly after. More sophisticated techniques based on deep neural networks have been recently developed to tackle realistic quantum chemistry problems [19,20]. In all these cases, the problem was set up as a variational one, and the solution was fully ab initio.
There are two key reasons that make ANNs particularly attractive in the quantum many-body domain. First, ANNs can encapsulate and compress information. If this compression is efficient enough, the complex content of many-body wave functions may be codified into manageable, specifically tailored and, possibly, deep ANNs [21]. Second, ML techniques are particularly suited to solve optimisation problems. In a physics setting, with the energy as a cost function, these can be easily mapped into variational problems. The expectation is that these variational artificial neural network (VANNs) are superior to traditional trial wave functions, due to their ability to express features flexibly.
By providing direct access to the many-body wave function, ML techniques open a series of interesting possibilities to find nuclear ground state, operator expectation values and dynamics. Whether or not one can actually implement VANN algorithms efficiently in nuclear many-body systems is at present an open question. Here, we present a proof-of-principle calculation of a nuclear system, the deuteron, using ready-made, available ML resources. The deuteron is a natural starting point to explore the feasibility of ab initio methods, which already includes some of the difficulties encountered in heavier systems at a much lower computational cost [22]. Our aim is not so much to present new insights, but rather to explore the quality of a simple ANN ansatz to the deuteron wave function, and to demonstrate that a variational setting is useful in this context. In solving the deuteron using VANNs, we also identify a series of issues that may be helpful in future developments of fully-fledged nuclear many-body ML Hidden layer

Input layer
Output layer Figure 1: ANN architecture used in this work. The input is a single value of momentum, q, and the wave functions are modelled in terms of a single-layer network. In the example above, the number of hidden nodes is N hid = 4. The ANN has two outputs, one for the S and one for the D state. solvers.

Methods
Our solution for the deuteron is variational. The trial wave function is parametrised in terms of a single-layer ANN with N hid hidden nodes and two output nodes, one for the L = 0 (S) and one for the L = 2 (D) state. The architecture of the ANN is shown in Fig. 1. The weights W (1) connect the input and the hidden layer, whereas W (2) connect the hidden and output layer. We also use a bias between the input and the hidden layer, b (1) . For a given N hid , there are a total of 4N hid parameters (including the biases), which we succinctly denote as a vector, W. These are used as variational parameters in a minimisation problem for the energy, where the wave functions Ψ W ANN are arbitrary admixtures of the S− and D− states, ψ L=0,2 ANN . For the hidden layer, we use the soft-plus activation function, which is continuous and less prone to be affected by the vanishing gradient problem [23]. The output layer is a weighted linear sum of the values of the hidden nodes. We note that we could have dedicated a single layer to each of the two states, at the expense of increasing the number of parameters. In this sense, our ansatz ANN is a minimal representation of the deuteron wave function.
We solve the problem explicitly in momentum space [24]. This is helpful for three practical reasons. First, in momentum space the kinetic term in the Hamiltonian of Eq. (1) is a continuous function. In contrast, in real space, the kinetic term would involve numerically costly derivatives on the ANN wave functions. Second, for the deuteron, the separation between center-of-mass and relative motion can be implemented straightforwardly. We thus solve the problem in relative momentum and, rather  than sampling for the wave functions in a probabilistic way [18], we discretise them using a Gaussian quadrature. With this, we avoid any potential sign problem. Third, a momentum space approach allows us to employ directly the numerical routines associated to the N3LO Entem-Machleidt nucleon-nucleon force, our interaction of choice [25]. We have tested the method with other momentumspace potentials, and have found similar levels of agreement with the corresponding benchmarks. We use the same momentum quadrature in all our integrals. A change of variables involving a tangent is performed on a Gauss-Legendre quadrature of N k = 64 points to extend the integration range from 0 to k max = 500 fm 1 . This approach provides a dense mesh at low momenta, while sparsely covering the high-momentum region. We use the same quadrature to solve the exact ground state eigenvalue problem, to set a benchmark for the VANN solution. The deuteron binding energy with this numerical method is E GS = −2.2267 MeV.
The choice of a continuous momentum basis, as opposed to a discrete basis, is further motivated by an important result on ANNs. The Universal Approximation Theorem guarantees that a network with a single layer can provide a faithful representation of any continuous function within a given domain, provided N hid is large enough [26,27]. In this sense, working in continuous momentum space, rather than in a discrete basis, may be advantageous. One naively expects that ANNs should mimic the shape of any wave function, if given enough nodes to do so. We stress here that the ANN ansatz can explore arbitrary shapes in momentum space in an unbiased way. The ansatz is thus not constrained by any prescriptive analytical shape, like traditional trial wave functions would be. This explains why the variational problem works here with a single wave function. In contrast, variational calculations with trial wave functions with 2 − 6 parameters may not able to bind the deuteron [24].
We solve the variational problem in three different steps, implemented using the ready-made PyTorch framework [28]. First, we initialise the network using random weight values. We sample from uniform distributions with W (1) ∈ [−1, 0), b (1) ∈ [−1, 1) and W (2) ∈ [0, 1). This differs from the traditional Xavier initialisation scheme, which has a poor performance in minimisation for this problem [29]. After this random initialisation, the wave functions are featureless and have no bearing to physical ones. In a second step, we therefore follow Ref. [18] and train the ANN to reproduce physically inspired, but arbitrary, target wave functions for each of the two states. These target wave functions should be a closer starting point to the variational problem. We use a functional form with ξ = 1.5 fm −1 , which provides target wave functions with an extent in momentum space which is similar to the exact solutions.
We train the ANN wave function to match the target wave function in a supervised manner. The cost function, C = C S + C D , is the sum of the individual contributions for each state, C L = (K L − 1) 2 , where we introduce the overlap .
The RMSprop scheme is used to minimise C for 10 5 iterations [30]. We use a learning rate (step size) α = 10 −2 , a smoothing constant β = 0.9, and a numerical stability constant = 10 −8 as the hyperparameters of the scheme. As the network calculates an unnormalised wave function for each partial wave, the normalisation constant effectively modulates W (2) by a factor of L=0,2 ψ L ANN |ψ L ANN . This allows for a relative large learning rate during the minimisation process. The resulting overlap is within 1 − 5% of the desired value of K L = 1.
The third and final step is the actual variational energy minimisation. We use ANN wave functions trained to the target S− and D−state wave functions as a starting point. We note that these have an unphysically large 50 % admixture of the two states, but this does not hinder the convergence of the VANN. We then let the network evolve to readjust the wave functions while minimising the energy. We use RMSprop again to minimise the energy cost function in Eq. (1), with the same set of hyperparameters α, β, and as in the initial supervised training stage. A typical energy minimisation curve for the case with N hid = 10 is shown in Fig. 2. Within the first few thousands of iterations, the network is able to bind the deuteron. In the first few thousands of iterations (not shown in the Figure for clarity), the descent is fast and smooth. After about 10, 000 iterations, fluctuations appear, due to the relatively large learning rate. This allows for the energy to be overshot at times, but the minimisation algorithm eventually corrects for that. At 50, 000 iterations, the binding energy is already within 10% of the benchmark value (dashed line). We stop our runs at

Results
We explore the bias and variance of the model, particularly the out-of-sample error, in two different ways. First, we change the number of hidden layer nodes from N hid = 2 to 20, in steps of 2. This provides an idea of how model predictions change with an increase in the number of variational parameters. Second, we initialise the model, train it to target wave functions and minimise the energy with 50 different random seed configurations. The results shown here are obtained as the means and standard deviations of these 50 individual runs. This helps identify weight initialisation effects.
When a ground state ANN wave function has been obtained, we quantify its quality by comparing it to the benchmark wave function from exact diagonalisation using a partial-wave fidelity, F L . This is akin to the overlap defined in Eq. (2) with the replacement ψ L targ → ψ L GS [18]. The closer F L is to one, the closer our wave function reproduces the exact diagonalisation results.
The main results are reported in the panels of Fig. 3 and, in a tabular form, in Table 1. We exclude the results of the N hid = 2 runs from the Figure for clarity, but report them in the Table. We note that with an overly Table 1: VANN results for the overlaps F S and F D (columns 1 and 2); binding energy E ANN (3) and D−state probability P D ANN (5) as a function of N hid . For completeness, we provide the benchmark binding energy, E GS , and probability, P D GS , in columns 5 and 7, respectively. simple N hid = 2 model, the deuteron is bound, but the energy is relatively far away from the benchmark. With N hid = 4, the quality of the ANN ansatz improves substantially, with fidelities within 0.1% of F L = 1 (central panel), and a binding energy (top panel) that is already within about ≈ 1% of the benchmark value, albeit with a significant standard deviation. As N hid increases, the energy approaches the benchmark, and stabilises around N hid ≈ 10. We find a relative agreement of the order of ≈ 1 keV in energies. The error in the fidelity also decreases initially, but remains relatively constant above N hid ≈ 10. We find that this error is below ≈ 0.005% across all the models from N hid = 10 to 20. Importantly, the variance obtained from the 50−run standard deviation remains relatively constant in all quantities in this region, indicating that the bias-variance trade-off is good up to this point. Having access to the wave functions, we can compute structural properties of the deuteron. The D−state probability, P D ANN = ψ D ANN ψ D ANN , is correlated with the strength of the tensor force. With as little as 4 hidden nodes the admixture between the S and D states is off by just over 1%. As N hid increases, the values quickly approach the benchmark P D GS = 4.51. The bottom panel of Fig. 3 indicates that the network is able to predict the admixture between the S− and D− state within about 0.1%, as long as N hid > 4. Here, again, we do not find a significant rise in variance as N hid increases.
Despite a relatively low variance and a very small error in the fidelities, the energy associated to the VANN wave function never quite reaches the benchmark value as N hid increases. Figure 4  region where a significant discrepancy remains is close to the origin, q < 0.05 fm −1 . The VANN wave functions tends to overshoot the exact wave functions with a linear low-momentum asymptotic behaviour. We conjecture that this low-momentum mismatch is largely responsible for the differences in energy. All the integrals in momentum space, like those shown in Eq. (2), carry a q 2 phase-space factor. The energy in Eq. (1) is no exception and, as a consequence, there is no energetic penalty for the VANN to miss the correct shape at zero momentum. This results in small mismatches in the energy, which the network cannot overcome without additional information in the ansatz or in the cost function. We believe that the correct ψ L ≈ k L low-momentum asymptotics may have to be explicitly incorporated. This technical development lies beyond the exploratory scope of this work.

Conclusions
Our results show, for the first time, that VANN techniques can be used successfully in solving bound-state nuclear physics problems. We find that networks with a single layer and as little as N hid = 6 nodes provide a deuteron binding energy within a few keV of microscopic benchmarks. The agreement generally improves as N hid increases and saturates around N hid ≈ 10. Wave functions and structural properties are a by-product of the calculation and have similar, if not superior, quality measures with respect to benchmarks. The variance of the models remain rather constant for a wide range of N hid . We speculate that small discrepancies in binding energy, of the order of a fraction of a percent, arise due to the missing wave function asymptotics at low momentum.
For the deuteron, these results are not yet competitive in terms of computing time. Our results however indicate that very simple architectures with a small number of nodes are already good starting points, yielding accurate results. If this simplicity can be exploited, the scaling in computing time of VANN techniques as the number of particles increase may remain relatively mild. If this is the case, one will be able to tackle heavier systems with this variational ab initio approach, as already demonstrated in quantum chemistry [19,20].
We foresee some bottlenecks before extending the reach of VANN techniques to higher mass numbers. First, techniques to explicitly include antisymmetrisation in the manybody wave function need to be developed. Recent results exploiting permutation-equivariant ANNs can provide a way forward [19]. Second, the network will have to deal with several configurations, as well as two-and three-nucleon interactions. This first application has already shown that VANNs can deal with more than one nuclear state at once. Third, and more important, it remains to be seen whether a generic extension of VANNs to incorporate arbitrary spin and isospin is possible. This may require specifically tailored deep ANN architectures. Only after these issues have been tackled, it will become clear whether ML is a competitive tool for ab initio nuclear physics.

Acknowledgments
This work is supported by the UK Science and Technology Facilities Council (STFC) through grant ST/P005314/1. We thank Pierre Arthuis and Mehdi Drissi for a careful reading of the manuscript and for useful discussions.