Multi-level Monte Carlo computation of the hadronic vacuum polarization contribution to $(g_\mu-2)$

The hadronic contribution to the muon anomalous magnetic moment $a_\mu=(g_\mu-2)/2$ has to be determined at the per-mille level for the Standard Model prediction to match the expected final uncertainty from the ongoing E989 experiment. This is 3 times better than the current precision from the dispersive approach, and 5-15 times smaller than the uncertainty on the purely theoretical determinations from lattice QCD. So far the stumbling-block is the large statistical error in the Monte Carlo evaluation of the required correlation functions which can hardly be tamed by brute force. Here we propose to solve this problem by multi-level Monte Carlo integration, a technique which reduces the variance of correlators exponentially in the distance of the fields. We test our strategy by computing the Hadronic Vacuum Polarization on a lattice with a linear extension of 3 fm, a spacing of 0.065 fm, and a pion mass of 270 MeV. Indeed the two-level integration makes the contribution to the statistical error from long-distances de-facto negligible by accelerating its inverse scaling with the cost of the simulation. These findings establish multi-level Monte Carlo as a solid and efficient method for a precise lattice determination of the hadronic contribution to $a_\mu$. As the approach is applicable to other computations affected by a signal-to-noise ratio problem, it has the potential to unlock many open problems for the nuclear and particle physics community.


I. INTRODUCTION
The current experimental value of the muon anomalous magnetic moment a µ = 11659208.9(6.3) × 10 −10 by the E821 experiment has the remarkable precision of 0.54 parts per million (ppm) [1], while the on-going E989 experiment at FNAL is expected to reach the astonishing precision of 0.14 ppm by the end of its operation [2]. The Standard Model (SM) prediction includes contributions from five-loop Quantum Electrodynamics, two-loop Weak interactions, the Hadronic leading-order Vacuum Polarization (HVP) and (the much smaller) Hadronic Light-by-Light scattering (HLbL), see Ref. [3] and references therein. The overall theoretical uncertainty is dominated by the hadronic part. So far 1 , lacking precise purely theoretical computations, the hadronic contributions have been extracted (by assuming the SM) from experimental data via dispersive integrals (HVP & HLbL) and low-energy effective models supplemented with the operator product expansion (HLbL). This leads to a µ = 11659181.0(4.3) × 10 −10 (0.37 ppm) [3], which deviates by 3 − 4 standard deviations from the E821 result, a difference persisting for a decade which may be a hint for a New Physics signal.
State-of-the-art lattice Quantum Chromodynamics (QCD) determinations of the HVP are becoming competitive. At present, quoted uncertainties range between 0.6% to roughly 2% [5][6][7][8][9][10][11] corresponding to an overall error on a µ which is still 5-15 times larger than the anticipated uncertainty from E989. Taken at face value, the 1 A recent proposal for an independent determination of the HVP from the muon-electron elastic scattering has been put forward in Ref. [4].
most recent lattice determination of the HVP [11] differs from the dispersive result by more than 3 standard deviations, and generates tensions with the global electroweak fits [12]. All these facts call for an independent theoreticallysound lattice computation of the hadronic contribution to a µ at the per-mille level from first principles. The main bottleneck toward this goal is the large statistical error in the Monte Carlo evaluation of the required correlation functions, see Ref. [3] and references therein. The aim of this letter is to solve this problem by a novel computational paradigm based on multi-level Monte Carlo integration in the presence of fermions [13][14][15]. With respect to the standard approach, this strategy reduces the variance exponentially with the temporal distance of the fields, thus opening the possibility of making negligible the contribution to the statistical error from longdistances. Here we focus on the HVP, but the strategy is general and can be applied to the HLbL, the isospinbreaking and electromagnetic contributions as well.

II. THE SIGNAL-TO-NOISE PROBLEM
The HVP can be written as [16] a HVP where α is the electromagnetic coupling constant, K(x 0 , m µ ) is a known analytic function which increases quadratically at large x 0 [17], m µ is the muon mass, and G(x 0 ) is the zero-momentum correlation function In this study we consider N f = 3, the three lighter quarks of QCD with the first two degenerate in mass, so that The light-connected Wick contraction G conn u,d (x 0 ) and the disconnected one G disc u,d,s (x 0 ) are the most problematic contributions with regard to the statistical error. In standard Monte Carlo computations, the relative error of the former at large time distances |x 0 | goes as where M ρ is the lightest asymptotic state in the isotriplet vector channel, and n 0 is the number of independent field configurations. For physical values of the quark masses, the difference (M ρ − M π ) can be as large as 3.2 fm −1 . The computational effort, proportional to n 0 , of reaching a given relative statistical error thus increases exponentially with the distance |x 0 |. For the disconnected contribution G disc u,d,s (x 0 ), the situation is even worse since the variance is constant in time and therefore the coefficient multiplying |x 0 | is larger. At present this exponential increase of the relative error is the barrier which prevents lattice theorists to reach a per-mille statistical precision for the HVP. In order to mitigate this problem, in state-of-the-art calculations the contribution to the integral in Eq. (1) is often computed from Monte Carlo data only up to time-distances of 1.5 − 2 fm or so, while the rest is estimated by modeling G(x 0 ), see Ref. [18] for an up-to-date review.

III. MULTI-LEVEL MONTE CARLO
Thanks to the conceptual, algorithmic and technical progress over the last few years, it is now possible to carry out multi-level Monte Carlo simulations in the presence of fermions [13,14]. The first step in this approach is the decomposition of the lattice in two overlapping domains Ω 0 and Ω 2 , see e.g. Fig. 1, which share a common region Λ 1 . The latter is chosen so that the minimum distance between the points belonging to the inner domains Λ 0 and Λ 2 remains finite and positive in the continuum limit.
Next step consists in rewriting the determinant of the Hermitean massive Wilson-Dirac operator Q = γ 5 D as where Q Λ1 , Q Ω0 , and Q Ω2 indicate the very same operator restricted to the domains specified by the subscript. They are obtained from Q by imposing Dirichlet boundary conditions on the external boundaries of each domain. The matrix w is built out of Q Ω0 , Q Ω2 and the hopping terms of the operator Q across the boundaries in between the inner domains Λ 0 and Λ 2 and the common region Λ 1 [14]. The denominator in Eq. (5) has already a factorized dependence on the gauge field since det Q Λ1 , det Q −1 Ω0 and det Q −1 Ω2 depend only on the gauge field in Λ 1 , Ω 0 and Ω 2 respectively.
In the last step, the numerator in Eq. (5) is rewritten as where u k and u * k are the N roots of a polynomial approximant for (1 − w) −1 , the numerator is the remainder, and C is an irrelevant constant. The denominator in Eq. (6) can be represented by an integral over a set of N/2 multiboson fields [19,20] having an action with a factorized dependence on the gauge field in Λ 0 and Λ 2 [13][14][15] inherited from w. When the polynomial approximation is properly chosen, see below, the remainder in the numerator of Eq. (6) has mild fluctuations in the gauge field, and is included in the observable in the form of a reweighting factor in order to obtain unbiased estimates.
A simple implementation of these ideas is to divide the lattice as shown in Fig. 1, where Λ 0 and Λ 2 have the shape of thick time-slices while Λ 1 includes the remaining parts of the lattice. The short-distance suppression of the quark propagator implies that a thickness of 0.5 fm or so for the thick-time slices forming Λ 1 is good enough, and is not expected to vary significantly with the quark mass. This is the domain decomposition that we use for the numerical computations presented in this letter.
The Monte Carlo simulation is then performed using a two-level scheme. We first generate n 0 level-0 gauge field configurations by updating the field over the entire lattice; then, starting from each level-0 configuration, we keep fixed the gauge field in the overlapping region Λ 1 , and generate n 1 level-1 configurations by updating the field in Λ 0 and in Λ 2 independently thanks to the factorization of the action. The resulting gauge fields are then combined to obtain effectively n 0 · n 2 1 configurations at the cost of generating n 0 · n 1 gauge fields over the entire lattice. In particular, for each level-0 configuration, we compute the statistical estimators by averaging the values of the correlators over the n 2 1 level-1 gauge fields. Previous experience on two-level integration [13,14,21,22]  suggests that, with two independently updated regions, the variance decreases proportionally to 1/n 2 1 until the standard deviation of the estimator is comparable with the signal, i.e. until the level-1 integration has solved the signal-to-noise problem. From Eq. (4) we thus infer that the variance reduction due to level-1 integration is expected to grow exponentially with the time-distance of the currents in Eq. (2). The overhead for simulating the extra multi-boson fields increases the cost by an overall constant factor which is quickly amortized by the improved scaling.

IV. LATTICE COMPUTATION
In order to assess the potential of two-level Monte Carlo integration, we simulate QCD with two dynamical flavours supplemented by a valence strange quark. Gluons are discretized by the Wilson action while quarks by the O(a)-improved Wilson-Dirac operator, see Refs. [15,23] for unexplained definitions. Periodic and antiperiodic boundary conditions are imposed on the gluon and fermion fields in the time direction respectively, while periodic conditions are chosen for all fields in the spatial directions. We simulate a lattice of size 96 × 48 3 with an inverse bare coupling constant β = 6/g 2 0 = 5.3, corresponding to a spacing of a = 0.065 fm [23,24]. The size of the lattice, rather large for a proof of concept, is chosen so to be able to accommodate a light pion and still be in the large volume regime, namely M π = 270 MeV and M π L ≥ 4. The domains Λ 0 and Λ 2 are the union of 40 consecutive time-slices, while each thick time-slice forming the overlapping region Λ 1 is made of 8 time-slices corresponding to a thickness of approximately 0.5 fm. The determinants in the denominator of Eq. (5) are taken into account by standard pseudofermion representations, while the number of multi-bosons is fixed to N = 12. The very same action and set of auxiliary fields are used either at level-0 or level-1. The reweighting factor is estimated stochastically with 2 random sources, which are enough for its contribution to the statistical error to be negligible. Further details on the algorithm and its implementation can be found in Ref. [15].
We generate n 0 = 25 level-0 configurations separated by 48 molecular dynamics units (MDU), so that in practice they can be considered statistically uncorrelated [23,24]. For each level-0 background gauge field, we generate n 1 = 10 configurations in Λ 0 and Λ 2 spaced by 16 MDUs. The connected contraction is calculated by inverting the Wilson-Dirac operator on local sources, while the disconnected one is computed via split-even random-noise estimators [25]. For each level-0 configuration, the statistical estimators are computed by averaging the correlators over the n 2 1 combinations of level-1 fields. The error analysis then proceeds as usual. For the sake of the presentation, we show results in physical units and properly renormalized: the central value of the lattice spacing is taken from Ref. [23], and the one of the vector-current renormalization constant from Ref. [26]. We do not take into account their contributions to the errors since on one side we are interested in investigating the statistical precision of the vector correlator computed via two-level integration only, and on the other side the numerical accuracy of those quantities can be improved independently.
To single out the reduction of the variance due only to two-level averaging, we carry out a dedicated calculation of correlation functions. We compute the light-connected contraction by averaging over 216 local sources put on the time-slice (y 0 /a = 32) of Λ 0 at a distance of 8 lattice spacings from its right boundary and, as usual, by summing over the sink space-position. This large number of sources guarantees that the dependence of the variance on the gauge field in the domains Λ 0 and Λ 2 is on equal footing, since no further significant variance reduction is observed by increasing their number. We determine the disconnected contraction by averaging each singlepropagator trace over a large number of Gaussian random sources, namely 768, so to have a negligible random-noise contribution to the variance [25]. The variance of the light-connected contribution as a function of the distance from the source is shown on the left plot of Fig. 2. For better readability only the time-slices belonging to Ω 2 are shown, i.e. those relevant for studying the effect of two-level integration given the source position. Data are normalized to the variance obtained with the same number of sources on CLS configurations 2 which were generated with a conventional one-level HMC [24,27,28]. The exponential reduction of the variance with the distance from the source is manifest in the data, with the maximum gain reached from 2.5 fm onward for n 1 = 10. The loss of about a factor between 2 and 3 with respect to the best possible scaling, namely n 2 1 , either for n 1 = 3 or 10 (dashed lines) is compatible with the presence of a residual correlation among level-1 configurations. Indeed the variance reduction for n 1 = 3, obtained by skipping 48 MDUs between consecutive level-1 configurations (labeled by n 1 = 3 * ), is compatible with the n 2 1 scaling at large distances within errors. In our particular setup, even for n 1 = 10 the statistical error at large distances scales de-facto with the inverse of the cost of the simulation rather than with its squared root. This is easily seen by comparing the variance reduction shown in the left plot of Fig. 2 with the cost of the simulation for n 1 = 10. The latter is in fact 4 times the one for n 1 = 1 due to the different separation in MDU units between two consecutive configurations at level-0 and level-1.
The power of the two-level integration can be better appreciated from the right plot of Fig. 2, where we show the variance of the light-connected contribution to the integrand in Eq. (1) as a function of the time-distance of the currents. The sharp rising of the variance computed by one-level Monte Carlo (n 1 = 1, red squares) is automatically flattened out by the two-level multiboson domain-decomposed HMC (n 1 = 10, blue triangles) without the need for modeling the long-distance behaviour of G conn u,d (x 0 ).  (8.6). While with the one-level integration the errors on the contributions to the integral from 0 to 2.5 fm and from 2.5 to the maximum value of 3.0 fm are comparable, with the two-level HMC the contribution to the variance from the long distance part becomes negligible. This pattern of variance reduction is expected to set in at shorter distances for lighter quark masses, where the gain due to the two-level integration is expected to be significantly larger due to the sharp increase of the exponential in Eq. (4). Considerations analogous to those made for the connected contribution apply also to the much smaller disconnected one, although even larger values of n 1 are required to render the variance approximately constant.

V. RESULTS AND DISCUSSION
Our best result for the light-connected contribution to the integrand in Eq. (1) is shown on the left plot of Fig. 3 (red squares). It is obtained by a weighted average of the above discussed correlation function computed on 32 point sources per time-slice on 7 time-slices at y 0 /a = {8, 16, 24, 56, 64, 72, 80} and on 216 sources at y 0 /a = 32. We obtain a good statistical signal up to the maximum distance of 3 fm or so. The strange-connected contraction G conn s (x 0 ) is much less noisy, and is determined by averaging on 16 point sources at y 0 /a = 32. Its value, shown on the left plot of Fig. 3 (blue circles), is at most one order of magnitude smaller than the light contribution, and has a negligible statistical error with respect to the light one. The best result for the disconnected contribution has been computed as discussed in the previous section, and it is shown in the left plot of Fig. 3 as well (green triangles). It reaches a negative peak at about 1.5 fm, and a good statistical signal is obtained up to 2.0 fm or so. Its absolute value is more than two orders of magnitude smaller than the light-connected contribution over the entire range explored (notice the multiplication by 10 for a better readability of the plot).
In the right plot of Fig. 3 we show the best values of the light-connected (red squares), strange-connected (blue circles), and disconnected (green triangles) contributions to a HVP µ · 10 10 as a function of the upper extrema of integration x max 0 in Eq. (1). The light-connected part starts to flatten out at x max 0 ∼ 2.5 fm and, at the conservative distance of x max 0 = 3.0 fm, its value is 471.8(6.2). The value of the strange-connected contribution is 52.55(21) at x max 0 = 3.0 fm, and its error is negligible with respect to the light-connected one. The disconnected contribution starts to flatten out at about x max 0 ∼ 2.0 fm, where its value is −1.98(84). For x max 0 = 3.0 fm, its statistical uncertainty is 2.1 which is still 3 times smaller with respect to the light-connected one. Clearly the disconnected contribution must be taken into account to attain a per-mille precision on the HVP, but the combined usage of split-even estimators and two-level integration solves the problem of its computation. By combining the connected contributions at x max 0 = 3.0 fm with the disconnected part at x max 0 = 2.0 fm, the best total value that we obtain is a HVP µ = 522.4(6.2) · 10 −10 .
In this proof of concept study we have achieved a 1% statistical precision with just n 0 · n 1 = 250 configurations on a realistic lattice. This shows that for this light-quark mass a per-mille statistical precision on a HVP µ is reachable with multi-level integration by increasing n 0 and n 1 by a factor of about 4-6 and 2-4 respectively. When the up and the down quarks becomes lighter, the gain due to the multi-level integration is expected to increase exponentially in the quark mass, hence improving even more dramatically the scaling of the simulation cost with respect to a standard one-level Monte Carlo. The change of computational paradigm presented here thus removes the main barrier for making affordable, on the computer installations available today, the goal of a per-mille precision on a HVP µ .
Here we focused on the main bottleneck in the computation of the HVP. It goes without saying that the very same variance-reduction pattern is expected to work out also for the calibration of the lattice spacing, the calculation of the electromagnetic corrections and the HLbL.
It is also interesting to notice that multi-level integration can be well integrated with master-field simulation techniques [29] if very large volumes turn out to be necessary to pin down finite-size effects at the per-mille level. As a final remark, we stress that the very same approach is applicable to many other computations which suffer from signal-to-noise ratio problems, where a similar breakthrough is expected [30].