Bootstrapping the statistical uncertainties of NN scattering data

We use the Monte Carlo bootstrap as a method to simulate pp and np scattering data below pion production threshold from an initial set of over 6700 experimental mutually $3\sigma$ consistent data. We compare the results of the bootstrap, with 1020 statistically generated samples of the full database, with the standard covariance matrix method of error propagation. No significant differences in scattering observables and phase shifts are found. This suggests alternative strategies for propagating errors of nuclear forces in nuclear structure calculations.


I. INTRODUCTION
The modern era of high quality NN interactions started when the long term studies of the Nijmegen group culminated in a succesfull least squares fit with a statistically significant χ 2 /ν ∼ 1 [1] after implementation of many small but crucial effects and 3σ inconsistent data were excluded. Since then, subsequent analyses have been carried out [2-10] having χ 2 /ν ∼ 1 and with the purpose of being used in ab initio Nuclear Structure calculations. As is well known [11] any least-squares fit, corresponds to χ 2 minimization min p χ 2 (p) = min where O exp i is a fitted observable, ∆O exp i the corresponding statistical error bar and O i (p) the theoretical model depending on fitting parameters p = p 1 , . . . , p P . The procedure assumes that the statistical uncertainties of the fitted data can be modeled by a probability distribution; namely independent normally distributed data N(O exp i , ∆O exp i ) an assumption based on counting a large number of events in NN scattering experiments.
The assumption of a finite number of normally distributed data is an indispensable prerequisite for both a meaningful uncertainty estimates from a phenomenological fit and any subsequent and reliable error propagation. Fortunately, normality can be checked a posteriori in probabilistic terms and within a given confidence level by application of a variety of statistical tests, which naturally become more stringent with the number of data. Of course, individually checking the probability distribution of over 6700 data points, involving over 300 experiments, some dating back more than 60 years, is rather impractical. However, if a model fitted to the data is flexible enough to accurately reproduce them, the normality of the experimental data implies that discrepancies between theory * Electronic address: rnavarrop@ugr.es † Electronic address: amaro@ugr.es ‡ Electronic address: earriola@ugr.es and experiment, known as residuals, must follow a standard normal distribution, i.e.
Once a fit has been made, testing Eq. (2) is straightforward. Despite its simplicity, normality testing has not been a common practice in nuclear interactions fitting (an early discussion on normality was however conducted in Ref. [12,13]). In a recent publication [9] we presented a new phenomenological Nucleon-Nucleon (NN) potential that accurately describes 6713 scattering data from 1950 to 2013 upgrading much of the previous works and increasing the statistics. This was done with an eye put on the determination of the uncertainties in the fitted NN interaction itself and their consequences in Nuclear Physics, for which little is still known (see however [6,14,15]) and statistical methods offer the most natural framework. On a more general level, a growing concern on the statistical analysis of nuclear theory and its predictive power has been initiated (see e.g. [16,17] for general and instructive overviews and references therein). We have applied some of the well known normality tests to three of our NN potentials, including the delta-shell potential with one pion exchange (DS-OPE) [8,9], (chiral) two pion exchange (DS-χTPE) [10] and a gaussian potential with OPE [18] and found the normality condition to hold in all of them [18]. The lack of normality would clearly signal an inconsistency in the fitting analysis and might be used as a guide to unveil systematic errors both in the data as well as in the model. It is thus foreseeable that normality tests will be regarded as an important ingredient in the design of NN interactions statistically inferred from scattering data (see e.g. [19] for a posteriori analysis of [7]).
In our previous works the covariance matrix method was used to propagate errors. In the present note we discuss the robustness of our results using Monte Carlo techniques and the bootstrap method [20]. While these methods have succesfully been exploited (see e.g. [21,22] for related studies within ππ scattering error analyses) to our knowledge they have never been implemented within the context of the NN force, so our presentation will be intentionally pedagogical.

arXiv:1407.3937v1 [nucl-th] 15 Jul 2014
We also outline interesting consequences regarding strategies for error propagation in nuclear physics.

II. COVARIANCE MATRIX METHOD
In the standard covariance method one starts with a least squares fit Eq. (1). Once the condition of normality, Eq. (2), has been checked [18] and assuming normality of errors in the fitting parameters we are in position to propagate the statistical uncertainties into the potential parameters and any calculation that takes this potential parameters as an input. The error matrix E i j of the potential parameters {p 1 , p 2 , . . . , p P } can be calculated by inverting the Hessian matrix which can be used to obtain confidence intervals for the parameters and correlations among them. Any quantity that can be calculated as a function of the potential parameters F(p 1 , p 2 , . . . , p P ) can be provided with an statistical error bar ∆F with the customary expression A good approximation to Eq.(3) can be found in [9] which has been used with Eq.(4) to estimate statistical uncertainties of phase-shifts, scattering amplitudes, deuteron properties, form factors, matrix elements and skyrme parameters [8-10, 18, 23, 24]. However the derivatives in Eq.(4), depending on the functional form of F, may be hard to calculate analytically. If one contemplates numerical evaluation this requires a repeated evaluation of the function F at several values of the fitting parameters, which for a large number of parameters ( typically 30-40 [8][9][10] ) may also be a costly procedure 1 . The calculation of derivatives can be avoided by drawing random numbers following a multivariate normal distribution determined by the covariance matrix E , This generates a family of potential parameters and calculate F with each potential. This Monte Carlo method directly propagates uncertainties, however a multivariate normal probability distribution to all the parameters is assumed which may not always be the case for the true distribution of parameters.
1 It can also be an innacurate procedure since the corresponding finite differences step h must be smaller than the statistical ∆p which are usually quite small. For instance, the evaluation of the Hessian numerically for our fits [8][9][10] requires to compute crossed derivatives, which turned out to be highly unstable for large number of parameters. This is why we prefered to compute the derivatives analytically and use in passing the highly efficient Levenberg-Marquardt minimization algorithm where a stable (definite positive) approximation to the Hessian is exploited [25].

III. THE BOOTSTRAP METHOD
The Bootstrap is a Monte Carlo technique that allows to find the most likely parameters probability distribution and propagate statistical uncertainties and correlations into any function F [20] (see also [25]). In our case the deviations between the theoretical model and the experimental data are normal statistical fluctuations the procedure corresponds to generate replicas of the observed data which are meant to simulate a fictitious experiment. Thus, for every experimental where ξ i,α ∼ N(0, 1) are standard normal and independent variables, ξ i,α = 0 and ξ i,α ξ j,β = δ i j δ αβ . This will generate M independent databases with the same number of data as the original one. Each synthetic database will represent a snapshot of the random fluctuations inherent to the experimental processes. A least squares fit to every generated database, O synth i,α (α = 1, . . . , M), featuring a maximum likelihood estimate can be made and a family of parameters p 1,α , . . . , p P,α will be obtained as Then, the most likely theory parameters are p α . The corresponding joined or marginal probability distributions can be obtained by binning the outcoming parameter samples. This allows to compute any function of the theoretical model parameters F(p) at a set of points F α ≡ F(p α ). Thus, the mean and variance can be computed for large M as usual, The correlation coefficient of two different observables is 1 2 is the correlation matrix. For asymmetric or skewed distributions, it may be better to define the 1σ asymmetric coverage by excluding 16% of the upper and lower values of the distribution instead of the variance definition, Eq. (9). At any rate we always check this possibility before errors are quoted.
While the bootstrap method requires to perform M repeated fits, it is a competitive alternative to determine errors and correlations when the covariance matrix itself is not directly available nor used in the minimization method [26]. Again, we stress that this method to generate snapshots of the statistical fluctuations is justified since the condition of Eq.(2) has been checked to a significant confidence level.  [8]. The points r i = ∆r(i+1) are grouped within every partial wave. We show the results obtained with the covariance matrix (left panel) and the Monte Carlo bootstrap simulation of experimental data (right panel). We grade gradually from 100% correlation, C i j = 1 (red), 0% correlation, C i j = 0 (yellow) and 100% anti-correlation, C i j = −1 (blue).

IV. NUMERICAL RESULTS
We apply the different methods to the 3σ self-consistent database presented in Ref. [9] where χ 2 /ν = 1.04. The potential used for this analysis has the form The long range piece V long ( r) contains a Charge-Dependent (CD) One pion exchange (OPE) with a fixed f 2 = 0.075 [13]) and electromagnetic (EM) corrections which are kept fixed throughout the fitting process. The short component was inspired by Avilés [27] (see also [28,29]) and reads whereÔ n are the set of operators in the extended AV18 basis [3,14,15,23], V i,n are fitting parameters and r i = ∆r(i + 1) with ∆r = 0.6fm. The fit is carried out more effectively in terms of some low and independent partial waves contributions to the potential (λ i,α ) JS l,l from which all other higher partial waves are consistently deduced (see Ref. [8,9]). The delta-shell potential reduces the computational effort enormously, so a large number of fits can easily be undertaken.
For the bootstrap analysis we took M = 1020 samples of the N = 6713 data and refitted the parameters of the DS-OPE potential (denoted by (λ i,α ) JS l,l ) which was used to determine the database. This generates M independent sets of most likely parameters to each synthetic database O i,α , α = 1, . . . , M. From there any function of the fitted parameters and the inherent correlations can be determined.
In Figure 1 we show the correlation matrix of the DS-OPE potential parameters obtained with the standard covariance matrix method and the Bootstrap method. It is not obvious, though most wellcome, that both covariance and bootstrap methods give fairly similar results, although small correlations are overestimated by the covariance matrix. The main difference between both methods is in the 3 S 1 -3 D 1 coupled channel. The Monte Carlo bootstrap simulation results in very small correlations between the 3 S 1 and ε 1 partial wave parameters and stronger correlations between ε 1 and 3 D 1 ; in contrast the covariance matrix method gives opposite results. These discrepancies could be related to the fitting of the deuteron binding energy where the approximation used for the Hessian matrix might be outside of its range of validity. In fact, the Monte Carlo generated 3 S 1 , ε 1 and 3 D 1 parameters show large asymmetries as can clearly be seen in Figure 2.
We compare in Fig. 3 the propagation of statistical uncertainties into phase-shifts by three methods: i) the standard covariance matrix method, ii) the equivalent Monte Carlo implementation of the covariance matrix using the multivariate normal distribution of Eq. (5) with M = 1020 and iii) the boostrap method also with M = 1020 samples. The first and the second methods should produce the same results for a sufficietnly large number of parameter samples. So the agreement between Eq.(4) and the Monte Carlo sampling of parameters Eq. (5) reflects the large M value with the same E . Although the bootstrap method tends to give slightly larger error bars the difference with the other two methods is not significant. As mentioned above, one potential advantage of the Bootstrap method is that it relaxes the assumption of normally distributed fitted parameters, a feature which proves relevant for asymmetric or skewed distributions. We find that the asym-  2: (Color online) 3 S 1 -3 D 1 coupled channel Delta Shell parameters distribution. The parameters are 3 S 1 partial wave (upper row), ε 1 mixing angle (middle row) and 3 D 1 partial wave (lower row). The blue bars give the normalized histogram from the 1020 fits to the Monte Carlo generated databases. The red line is the normal distribution given to each parameter by the covariance matrix method. metries seen in Figure 2 do not significanly propagate to the corresponding phase shifts.
As a matter of principle the Monte Carlo simulation of data gives the most reliable uncertainty propagation, but considering that performing a large number of full-length fits to data can be computationally expensive the covariance matrix methods are a fairly good and extremely useful approximation which will be exploited in future work.

V. CONCLUSIONS
The propagation of statistical errors of nuclear forces stemming from the finite precision and number of experimental NN scattering data requires in the first place passing a normality test. However, even in this favourable case the actual calculation may be computationally demanding because of a practical need of repeating large scale computations. It is thus important to explore methods where the number of calculations can be kept to a minimum. In the standard covariance matrix method one needs the evaluation of the Hessian as well as the derivatives of the object function whose uncertainties are evaluated with respect to the theoretical model parameters. As an alternative the Monte Carlo method based on explicit knowledge of the Hessian can profitably be used as it avoids the computation of derivatives (analytical or numerical) and automatically implements in any snapshot the inherent correlations in the fitting parameters. The previous methods assume a multivariate normal distribution of the fitting parameters.
We have thus analyzed the more ellaborated bootstrap method which also rests on the normality test and is based on a multiple minimization to a synthetic set of data generated by the distribution of the most likely estimate of the model parameters. While this approach assumes normality of the experimental data but not of the fitting parameters, it allows to handle possible skewness in the parameter distributions. Our bootstrap analysis confirms the error and correlations already found by the covariance method. This work is supported by Spanish DGI (grant FIS2011-24149) and Junta de Andalucía (grant FQM225). R.N.P. is supported by a Mexican CONACYT grant.