Light-quark connected intermediate-window contributions to the muon $g-2$ hadronic vacuum polarization from lattice QCD

We present a lattice-QCD calculation of the light-quark connected contribution to window observables associated with the leading-order hadronic vacuum polarization contribution to the anomalous magnetic moment of the muon, $a_\mu^{\mathrm{HVP,LO}}$. We employ the MILC Collaboration's isospin-symmetric QCD gauge-field ensembles, which contain four flavors of dynamical highly-improved-staggered quarks with four lattice spacings between $a\approx 0.06$-$0.15$~fm and close-to-physical quark masses. We consider several effective-field-theory-based schemes for finite-volume and other lattice corrections and combine the results via Bayesian model averaging to obtain robust estimates of the associated systematic uncertainties. After unblinding, our final results for the intermediate and ``W2'' windows are $a^{ll,{\mathrm W}}_{\mu}(\mathrm{conn.})=206.6(1.0) \times 10^{-10}$ and $a^{ll,\mathrm {W2}}_{\mu}(\mathrm{conn.}) = 100.7(3.2)\times 10^{-10}$, respectively.


I. INTRODUCTION
In April 2021, the Fermilab muon g − 2 experiment, E989, released their first result for the muon's anomalous magnetic moment a µ ≡ (g µ − 2)/2 based on Run-1 data collected in 2018 [1]. When combined with the previous measurement from Brookhaven National Lab experiment E821 [2], the new result for the muon's anomalous magnetic moment increases the disagreement with the Standard Model (SM) theory prediction [3] 1 from 3.7σ to 4.2σ. Because the anomaly arises from loop effects, it is sensitive to the contributions of yetundiscovered particles that could give rise to small deviations from the theoretical prediction. Increased precision is now essential to say conclusively if this substantial difference is from physics beyond the SM.
The error on the experimental average of the muon's anomalous magnetic moment is now 0.35 parts per million (ppm), and is limited by statistics. Fermilab E989 continues to collect data and improve the experimental apparatus, and ultimately aims to measure a µ to a precision at or below 0.14 ppm by the end of its lifetime. Additionally, a new complementary experiment to measure the muon's anomalous magnetic moment and electric dipole moment is planned for later this decade at J-PARC in Japan [24,25]. The J-PARC E34 experiment will employ a different method to determine a µ than the "magic momentum" approach of both Fermilab E989 and BNL E821 and aims for a precision of 0.45 ppm from its initial run in 2027 [24,26].
Corresponding theoretical efforts are underway to reduce the uncertainty on the prediction for a µ in the SM, which currently stands at 0.37 ppm [3]. At present, over 90% of the SM theory error comes from the leading-order hadronic vacuum polarization (HVP) contribution to the anomaly, a HVP,LO µ . The HVP contribution is difficult to determine precisely because the bulk of it comes from the low-energy, nonperturbative regime of quantum chromodynamics (QCD). To date, the most precise theoretical results for a HVP,LO µ are obtained from a data-driven, dispersive approach [27,28] using experimental measurements of the total cross section for e + e − → hadrons (the so-called R ratio) as input. These data-driven determinations have achieved around 0.5% precision on a HVP,LO µ corresponding to 0.34 ppm uncertainty on a µ [8][9][10][11][12][13] and are the basis of the Muon g−2 Theory Initiative's SM prediction for a HVP,LO µ [3]. Lattice QCD provides an alternative, ab initio, approach for calculating the leading-order HVP contribution that is independent of experimentally measured cross sections. 2 The most precise lattice QCD calculation of a HVP,LO in the dispersive integral for a HVP,LO µ [35]. Because Euclidean-time windows allow for more detailed and sensitive comparisons between independent a HVP,LO µ calculations, they are a valuable tool both for diagnosing sources of disagreement between lattice-QCD results and for quantifying differences (if any) between data-driven and lattice determinations.
Various Euclidean-time windows with differing positive features and drawbacks have been proposed [32][33][34]. In 2018, the RBC and UKQCD Collaborations separated the Euclideantime integral into contributions from "short" (t 0.4 fm), "intermediate" (0.4 t 1.0 fm), and "long" (t 1.0 fm) times [32]. The intermediate window observable a W µ can be computed in lattice QCD with high statistical precision. Hence, it has been adopted by the muon g − 2 theory community as a benchmark quantity. Several independent three-and four-flavor lattice-QCD calculations of a W µ are now available [29,32,36,37], but the results are not fully consistent. RBC/UKQCD's initial intermediate-window result [32] is within about 1σ of the determination from R-ratio data [35]. More recent lattice-QCD calculations of a W µ by the BMW [29], Mainz/CLS [36], and ETM [37] collaborations, however, are all more than 3σ higher than the data-driven value. 3 Further scrutiny of the intermediate window is therefore needed to clarify the picture.
In this work, we calculate the intermediate-window contribution to a HVP,LO µ in four-flavor lattice QCD. Using the same methods, we also calculate the "W2" window observable introduced by Aubin et al. [33], which corresponds to the Euclidean-time range t ∈ [1.5, 1.9] fm. As pointed out in that work, although a W2 µ is statistically noisier than a W µ , effective-field theory (EFT) estimates of finite-volume, lattice-discretization, and pion-mass corrections are more reliable at larger times. We focus exclusively on the connected contribution from light (up and down) quarks in the isospin-symmetric limit, which accounts for about 90% of a HVP,LO µ . Calculations of the heavier quark flavors, isospin-breaking corrections and quarkdisconnected contributions are in progress [39][40][41][42].
Our calculation of the intermediate and W2 window observables in this work builds upon our 2019 calculation of a HVP,LO µ [43]. As before, we employ the MILC collaboration's dynamical-QCD gauge-field configurations [44] with four flavors of highly-improvedstaggered quarks (HISQ) [45]. Our numerical simulations are again performed at the physical pion mass and with four lattice spacings ranging from about 0.15 to 0.06 fm. Since our earlier work, however, we have increased statistics significantly at our three finest lattice spacings. The new data give better control of the lattice-dependence of the Euclidean-time window observables and enable stringent tests of the EFT-based corrections, which inform our analysis of the associated systematic errors. We estimate the uncertainties on a W µ and a W2 µ from making different, reasonable analysis choices for finite-volume corrections and treating discretization effects, among others, via Bayesian model averaging [46,47]. Finally, to avoid confirmation bias, the Euclidean-time window observables were blinded until the analysis and error budgets were finalized. (See Sec. III A for details.) This paper is organized as follows. First, in Sec. II A we provide analytic expressions for a W µ and a W2 µ in terms of the Euclidean-time vector-current correlation function. Next, in Sec. II B we define the isospin-symmetric QCD limit employed here and describe our numerical correlator computations in Sec. II C. In Sec. III we present a detailed description of our analysis procedures, starting with how blinding was applied and removed in Sec. III A. Briefly, in Sec. III B, we use the lattice correlators to calculate the Euclidean-time windows corresponding to our numerical-simulation parameters. We then correct these "raw" window values on each ensemble for the finite lattice volume, slight mistunings of the simulated pion mass, and (optionally) remove taste-breaking discretization effects in Sec. III C. Next, we extrapolate the corrected window values to zero lattice spacing in Sec. III D. Sections III E and III F describe our procedure for Bayesian model averaging and the resulting systematic error budget. We conclude in Sec. IV by presenting our final results for a W µ and a W2 µ and comparing them with previous determinations. Appendices A and B provide additional details on obtaining the Euclidean-time windows from staggered correlators and on computing corrections to the windows using the chiral model of pions, photons, and ρ mesons introduced in Ref. [48], respectively. Progress reports on related ongoing work can be found in Refs. [40-42, 49, 50].

A. Definitions of Euclidean-time window observables
The hadronic vacuum polarization function Π(Q 2 ) can be obtained from Euclidean vectorcurrent correlation functions through the equations It is now standard for lattice-QCD HVP calculations, however, to employ the alternative time-momentum representation introduced by Bernecker and Meyer [31]. This formulation is more convenient for an inherently space-time approach such as lattice QCD and allows the construction of Euclidean-time windows. Starting with the spatial-vector-current correlation function C(t), defined as where K E (Q 2 ) is given in Eq. (B2). The window observables are then easily obtained by introducing the window function W, limiting the Euclidean-time region over which C(t) is integrated [32]: The parameters t 0 and t 1 of W control the location of the window's boundaries, while ∆ controls the sharpness of its edges. with the parameters in fm. We plot W in Eq. (2.7) for these window regions in the left panel of Fig. 1. It is convenient in lattice-QCD calculations of a HVP,LO µ to separately compute and then sum up the contributions from each quark flavor and from connected and disconnected Wick contractions. Here we focus on the light-quark connected contribution to the Euclideantime windows, a ll,W µ (conn.) and a ll,W2 µ (conn.), in the isospin-symmetric limit. Therefore, our electromagnetic vector current J µ (x) includes only the terms for light quarks with m l = (m u +m d )/2 and our correlation function C(t) includes only the connected contractions. Figure 1, left, shows the light-quark connected contribution to the integrand [K(t)C(t) in Eq. (2.4)] using the lattice correlation function obtained on our finest ensemble (see Sec. II C). Figure 1, right, shows the corresponding window integrands [C(t)K(t)W (t, t 0 , t 1 , ∆) in Eq. (2.6)] for the W and W2 windows.

B. Prescription for isospin-symmetric QCD
Both the MILC HISQ gauge-field configurations and light-quark connected correlators employed in this work correspond to the isospin symmetric limit of QCD, i.e., a pure-QCD  [43] (labeled "2019"). Each disk is centered at the corresponding ensemble's squared lattice spacing and pion mass (a 2 and M π5 in Table I), while the disk areas are proportional to the size of each data set (N conf × N loose sources in Table I).
world with equal-mass up-and down-quark masses and without electromagnetism. Following the prescription introduced (for three flavors) in Refs. [51,52] and later extended to include the charm quark in Ref. [53], we set the light-quark masses and lattice scale in physical units using the pion mass and decay constant. We then set the strange-and charm-quark masses using the kaon and D s -meson masses, respectively.
Prior to the tuning procedure, however, electromagnetic effects must be removed from the experimental inputs. Neither M π 0 nor f π + are affected significantly by the quarks' electric charges, so their pure-QCD values are defined to be M π ≡ M π 0 and f π ≡ f π + . In the Fermilab Lattice and MILC Collaboration's most recent analysis of pseudoscalar-meson masses and decay constants [54], the numerical values for these inputs were taken from the 2016 Particle Data Group: M π 0 = 134.977 MeV and f π + = 130.50 (1) [55]. The remaining pure-QCD meson masses employed in Ref. [54] are M K 0 = 497.567 MeV, M K + = 491.405 MeV, and M Ds = 1967.02 MeV. Details on how these values were obtained can be found in Sec. IV of that work and references therein. For the inputs in isospin-symmetric QCD we use the same values for M π 0 and f π + as above, while the kaon mass is defined as the average of M K 0 and M K + , giving M K = 494.486 MeV.

C. Lattice-QCD ensembles and correlation functions
Our calculation employs the MILC Collaboration's four-flavor lattice-QCD configurations with dynamical up, down, strange, and charm quarks. The ensembles use the HISQ action [45] for the sea quarks, a Symanzik-improved gauge action [56][57][58][59][60] that includes the plaquette, the 1 × 2 rectangle, and the so-called bent-chair 6-link term for the gluon fields as well as tadpole improvement [61] based on the plaquette. Details of the configuration generation can be found in Ref. [62].
In this work, we employ a subset of the available MILC ensembles, for which the quark  [63] to the lattice spacing, where we take these values from Ref. [43] except for the newer ensemble with a ≈ 0.12 fm. To convert simulation results to physical units, we take w 0 = 0.1715(9) fm from Ref. [64]. The sixth column gives the taste-Goldstone pion masses [44]. The seventh column lists the renormalization factors for the local vector current, taken from Ref. [65]. The second-last column lists the number of configurations analyzed. The last column gives the number of loose-residual solves per configuration used in the truncated solver method [66,67].  [43], where additional details can be found. The ensemble at a ≈ 0.12 fm was generated specifically for our muon g − 2 project, and has better tuned sea-quark masses compared with the ensemble with the same bare coupling used previously [43]. It contains about 10,000 configurations. We also extended the ensemble with a ≈ 0.09 fm [44] to include over 5,000 configurations; this is a factor of roughly 3.5 beyond what was used in Ref. [43]. The pion mass for this ensemble is less accurately tuned than for the other three ensembles used in our study, which were generated more recently using quark masses obtained from a detailed analysis of pseudoscalar mesons and their decay constants [54]. Finally, we increased the number of configurations in our finest ensemble with lattice spacing a ≈ 0.06 fm [54] by about a factor of two compared with Ref. [43]. We are continuing to extend this ensemble in anticipation of future needs. Our ensemble set is visualized in Fig. 2 and detailed in Table I. The tuned quark masses listed in Table I are determined from the analysis in Ref. [54] in which pseudoscalar-meson masses and decay constants were computed using 24 gauge ensembles with six lattice spacings ranging from ≈ 0.15 to 0.03 fm. The pion decay constant is used to set the scale and the meson masses used to determine the up, down, strange, and charm masses are given in Sec. II B. Further details may also be found in Ref. [53] which used fewer configurations for a similar study.
The light-quark propagators from which the correlation functions C(t) are constructed are computed using the HISQ action and truncated solver method (TSM) [66,67]. Using random-wall sources, we compute one fine-residual conjugate gradient solve and the number of loose-residual solves (N loose ) shown in the last column of Table I. Compared with Ref. [43], we have increased the number of loose sources per configuration by factors of 4, 3, and 1.5 for the ensembles at a ≈ 0.12, 0.09, 0.06 fm, respectively. Exploiting time-reversal invariance, we further increase statistics by averaging the correlator values at times t and N t − t on each configuration. For the electromagnetic-current operator, we use the same local tastevector vector current as in Ref. [43]. To match the local vector current to continuum QCD, we use the nonperturbatively computed renormalization factors obtained by the HPQCD  (83) Collaboration in Ref. [65]. Specifically, for the a ≈ 0.15, 0.12, and 0.09 fm ensembles, we take the "H-H" Z V 4 values from Table IV, while for the a ≈ 0.06 fm ensemble, we use the extrapolated value at this lattice spacing given in Appendix B of that work. To test our corrections for pion-mass mistuning (see Sec. III C 2), we generated additional vector-and pseudoscalar-current correlation functions with unphysical valence-quark masses on our two coarsest ensembles. Table II lists the valence-quark masses used in these partially quenched simulations.
In the course of our current analysis, we discovered two mistakes in Ref. [43] pertaining to the a ≈ 0.06 fm ensemble. First, a small subset (≈ 5%) of this ensemble's correlation functions were affected by a software bug in the data processing script. Second, the renormalization factor employed for this ensemble was taken from the arXiv version of Ref. [65], and differs from the published result by ≈ 0.1%. The latter error was realized after we unblinded our analysis (see Sec. III A). Hence, while keeping the analysis procedure frozen, we now use the published value of Z V 4 at a ≈ 0.06 fm from Ref. [65] in our determinations of the window observables. We are preparing errata to Refs. [34,43], but do not expect the results for a ll µ (conn.), a HVP,LO µ , or the one-sided Euclidean-time windows to change significantly.

III. DATA ANALYSIS
Here we present the analysis procedure to obtain the window observables a ll,W µ (conn.) and a ll,W2 µ (conn.) at the physical, isospin-symmetric, pion mass and in the continuum and infinitevolume limits. First, as discussed in Sec. III A, we blinded the analyses of all components of a HVP,LO µ to avoid unintentional introduction of bias. Second, because the correlation functions C(t) are obtained at discrete Euclidean times, the integral in Eq. (2.6) must be approximated by a discrete integration rule. As described in Sec. III B, we use both the trapezoidal and Simpson's rules to quantify the associated discretization effects.
As is well known, staggered actions include additional, unphysical degrees of freedom (so-called "tastes"), yielding a 16-fold enlarged meson spectrum at finite lattice spacing [44,62,81,82]. The splittings between the tastes are a lattice artifact that vanishes in the continuum limit. At finite lattice spacing, taste splittings of the pion masses are a significant discretization effect in a HVP,LO µ observables (so-called taste-breaking effects). As discussed in Sec. III C, the splittings also affect the pion-mass and finite-volume dependencies, resulting in an interplay between them. In the case of pseudoscalar meson masses and weak matrix elements, discretization effects due to the taste-splittings are well-described by staggered chiral perturbation theory [83,84], providing an additional handle on the continuum extrapolations.
Our continuum extrapolation analysis in Sec. III D includes a comprehensive study of taste breaking and other discretization effects. For each of the two window observables, we perform continuum extrapolations with and without first correcting for taste-breaking effects. In addition, we vary the fit function used for the continuum extrapolations, and we also include continuum-limit fits dropping the data at the coarsest lattice spacing (a ≈ 0.15 fm). The fit function used in our continuum extrapolation contains the strong coupling constant α s . In this work, following Ref. [54] we use α s = α V (2/a) and take α V (n f = 4, µ = 5.0 GeV) = 0.2530 (38) from Ref. [85], where we evolve the coupling using the four-loop beta function.
Our lattice-QCD calculations of a ll,W µ (conn.) and a ll,W2 µ (conn.) entail numerous analysis choices. As described in Sec. III E, we incorporate the systematic uncertainties due to these variations using Bayesian model averaging (BMA) [46,47]. The remaining uncertainties in the corrected data sets include the statistical errors from the Monte-Carlo integration and parametric errors from w 0 , w 0 /a, and Z V , which are propagated through the analysis as Gaussian random variables. Our final results for a ll,W µ (conn.) and a ll,W2 µ (conn.) and error budgets from the respective BMA analyses are presented and discussed in Sec. III F.

A. Blinding
To avoid confirmation bias, we blinded this analysis until the systematic error budgets were finalized. The analysis was then frozen and used to generate the unblinded results and figures presented here. We employ a software blinding procedure, in which each observable is multiplied by an unknown random factor, chosen from a uniform distribution between [0.7, 1.3]. As the correlation function on the a ≈ 0.15 fm ensemble is unchanged from Ref. [43], we additionally blinded the results from this ensemble by adding to it an offset equal to its standard deviation times an undisclosed random number between [−1, 1]. Each observable receives its own unique blinding factor, which is kept the same for all lattice spacings, except 0.15 fm. This procedure allows us to unblind specific sub-quantities, such as the intermediate window observables discussed here, without unblinding other quantities for which our analyses are ongoing.

B. Extraction of window observables
After the light-quark-connected vector-current correlation functions C(t) are obtained on each ensemble as described in Sec. II C, lattice values for a ll,W µ (conn.) and a ll,W2 µ (conn.) are computed from these correlators via Eq. (2.6), using a chosen numerical integration scheme. Because we employ a single-time-slice vector-current operator in our computations, the spectral representation of our staggered correlation functions consists of a sum of positive contributions from states with the desired parity and, additionally, contributions that oscillate in time as (−1) t/a from opposite-parity states (see Eq. (A1) of Appendix A). These oscillations are discretization effects, and should in principle be removed via the continuum extrapolation in Sec. III D. To quantify any residual uncertainty or bias on a ll,W µ (conn.) and a ll,W2 µ (conn.) from oscillations in our correlation functions, we also perform our full analysis using the fit-reconstructed correlator without oscillations, C no osc. (t), which is defined in Eq. (A3). Appendix A also presents a number of alternate schemes for removing the unwanted oscillating terms from C(t).
Numerical integration of the lattice correlators introduces additional discretization errors that depend upon the method used. Here we consider two integration schemes: the trapezoidal rule and Simpson's rule, which is formally higher order in the lattice spacing. Given a Euclidean-time correlator C(t), windows of a µ are obtained with the trapezoidal rule via a win(t 0 ,t 1 ,∆) µ, Trap.
where the integration kernelK(t) and window function W(t, t 0 , t 1 , ∆) are given in Eqs. (2.5) and (2.7), respectively, and the boundary terms are omitted asK(t)W(t, t 0 , t 1 , ∆) = 0 for these cases. Similarly, Euclidean-time windows are obtained with Simpson's rule via Figure 3 compares lattice data for a ll,W µ (conn.) (left) and a ll,W2 µ (conn.) (right) obtained by integrating C(t) using the trapezoidal rule (blue squares), integrating C no osc. (t) using the trapezoidal rule (orange squares), and integrating C no osc. (t) using Simpson's rule (red circles). To enable meaningful comparisons between a ll,W µ (conn.) (or a ll,W2 µ (conn.)) at different lattice spacings, all data in these plots include corrections for the finite spatial volumes and pion-mass mistuning using the CM. (See Sec. III C for details.) The impact of temporal oscillations in our staggered lattice correlators on a ll,W µ (conn.) and a ll,W2 µ (conn.) can be assessed by comparing the data sets obtained from integrating both C(t) and C no osc. (t) using the trapezoidal rule (blue and orange squares in Fig. 3, respectively). For a ll,W µ (conn.), the trapezoidal-rule data sets are statistically indistinguishable at our two finest lattice spacings (see Table VI in Appendix A, which provides the correlated pairwise differences). Further, the differences between them decrease rapidly with the lattice spacing. For a ll,W2 µ (conn.), which corresponds to a later Euclidean time range, oscillations in the correlator from heavier opposite-parity states have largely died out (see Fig. 1). Consequently, the trapezoidal-rule data sets are statistically consistent on all ensembles and there is no clear lattice-spacing dependence in their correlated differences. As, for both a ll,W µ (conn.) and a ll,W2 µ (conn.), the continuum extrapolations of the trapezoidal-rule data sets are in excellent agreement, we therefore conclude that temporal oscillations are an insignificant source of discretization error in our calculation.
Similarly, discretization errors stemming from the numerical integration can be estimated comparing the data sets obtained by integrating C no osc. (t) with either the trapezoidal rule µ (conn.) (right) from integrating the raw lattice correlator C(t) with the trapezoidal rule (blue squares), fit reconstruction with oscillating-state contributions removed C no osc. (t) with the trapezoidal rule (orange squares), and C no osc. (t) with Simpson's rule (red circles). Data at the same lattice spacing are offset horizontally for visibility. As described in Secs. III C 1 and III C 2, each point is corrected for finite-volume effects using the CM and pion-mass mistuning effects using the data-driven approach. For each integration scheme, we fit the data for the three finest ensembles to a function linear in α s a 2 . The dashed curves show the fits' error bands, with colors matching the corresponding plot symbols.
or Simpson's rule (orange squares and red circles in Fig. 3, respectively). As is displayed in the figure and quantified in Table VII, on our coarse ensembles the differences between integration schemes are statistically significant for both a ll,W µ (conn.) and a ll,W2 µ (conn.). These differences, however, decrease with lattice spacing much faster than a 2 and are already at the per-mille level at our finest lattice spacing. We therefore conclude that lattice artifacts from the choice of numerical integration scheme are negligible compared to the leading discretization terms in the Symanzik effective Lagrangian, which are of O(α s a 2 ) (see Sec. III D for details).
Based on these observations, we generate two data sets for each of the two observables (a ll,W µ (conn.) and a ll,W2 µ (conn.)). The first is obtained from integrating the original correlation function data C(t) with the trapezoidal rule, and the second is obtained from integrating the reconstructed correlation function data C no osc. (t) with Simpson's rule. The inclusion of both data sets in the subsequent analysis accounts for any residual systematic effects due to both O(a 2 ) artifacts induced by the trapezoidal rule as well as the oscillating contributions that remain after the continuum extrapolation. For each observable, the two data sets are taken as inputs in the next step of the analysis, where the corrections are applied to the a µ data.

C. Lattice corrections
Before taking the continuum limit, we correct our lattice a ll,W µ (conn.) and a ll,W2 µ (conn.) data in two (or three) separate steps: first for finite volume (FV), second for pion-mass mistuning (M π ), and (sometimes) third for the effects of taste splittings (TB). The last step is optional, since changing discretization effects will only alter the window observables' lattice-spacing dependence, not their continuum-limit values. Further, the pion-mass corrections to the intermediate and W2 windows are only numerically significant on the a ≈ 0.09 fm ensemble for which the simulated pion mass is ∼ 5% below the physical value (see Table II).
Mathematically, our correction scheme is defined via the following equations: "a µ " is shorthand for either a ll,W µ (conn.) or a ll,W2 µ (conn.). The first term on the right-handside of Eq. µ (conn.) from the simulated spatial volume, indicated by L latt , to the infinite-volume limit, denoted by L ∞ . The second correction, ∆ Mπ , takes the simulated taste-Goldstone pion mass to the physical value, while the final correction, ∆ TB , removes the effects of the pion tastesplittings, a 2 ∆ ξ i . In practice, the lattice and "physical" staggered-pion masses M π lat,ξ i and M π phys,ξ i in Eqs. (3.4)-(3.6) are calculated via the leading-order staggered χPT relationship M 2 ,ξ 1 and M π phys,ξ 1 fixed to the taste-Goldstone pion mass (column six of Table I) and the experimentally-measured π 0 mass, respectively.
The order in which the finite-volume, pion-mass, and taste-breaking corrections are applied impacts the form of the corrections. In our case, we apply the corrections in Eq. (3.3) from left to right. Therefore, the finite-volume and pion-mass mistuning corrections in Eqs. (3.4) and (3.5) must preserve the taste splittings. The left-hand-side of Eq. (3.3) is the infinite-volume, physical pion-mass and finite-lattice-spacing a µ with correct physical parameters, which are inputs to the continuum extrapolations described in Sec. III D.
We employ four different effective-field-theory-based schemes for the finite-volume and taste-breaking corrections: • Chiral Perturbation Theory (χPT) at next-to-leading order (NLO) and next-to-nextto-leading order (NNLO) [29,69,71]. The staggered NNLO χPT expressions of Refs. [29,33,71] are derived for a taste-singlet vector current, which couples to tastediagonal pion pairs. We adapt the expressions of Ref. [33] to the taste-vector vector current employed here, which couples to taste-nondiagonal two-pion states. We replace the pion energies in Eq. (3.3) of Ref. [33] with averages of the energies of the two pions in the two-pion states which contribute to the taste-vector vector current. 4 We test this approximation for the case of NLO χPT (and for the CM and MLLGS approaches discussed below) where we have exact formulas for the taste-vector current. These tests reveal at most sub-percent differences in the corrections computed using the exact approach versus the approximation.
• The Chiral Model (CM) is an extension of χPT, where the ρ meson is included explicitly through a massive spin-1 vector field. This model was introduced by Jegerlehner and Szafron to study ρ − γ mixing in e + e − → ππ scattering [86]. It was first applied to Euclidean-space lattice-QCD calculations of the muon g − 2 HVP (with modifications to incorporate the staggered-pion mass spectrum) by the HPQCD Collaboration [48]. 5 The equations for the CM corrections computed in this analysis differ from Refs. [43,48] slightly in that the effects of taste-breaking are included as specified in Eqs. (B8)-(B10).
• The Meyer-Lellouch-Lüscher-Gounaris-Sakurai (MLLGS) approach combines the pion form-factor parameterization of Gounaris-Sakurai with the mapping (due to Meyer-Lellouch-Lüscher [72][73][74][75][76][77][78][79]) between the infinite-volume scattering amplitude and finitevolume energies and amplitudes of the two-pion states. We account for taste-breaking effects in the same fashion as Ref. [29], by including contributions from two-pion states constructed with all 16 tastes of pions. Here, we modify the expressions of Ref. [29] to the case of the taste-vector vector current which couples to taste-nondiagonal two-pion states. As in [29], we fix the number of finite-volume states to n = 8.
• The relativistic-pion effective-field-theory approach by Hansen and Patella (HP) for finite-volume effects [80]. We obtain the correction defined in Eq. (3.4) using the same replacement as described in the χPT description above.
We describe the above schemes as "effective-field-theory-based" because, in some parts of our analysis, they may be employed outside the schemes' ranges of validity. Except for the CM, these EFTs and phenomenological models include only the contributions to a HVP,LO µ observables from two-pion intermediate states. 6 Because contributions to the Euclidean-time correlation function fall off as exp(−Et) (see Appendix A), those from low-lying ππ states are most important at large Euclidean times. Consequently, the correction schemes listed above should best describe the volume and pion-mass dependence of C(t) and, hence, a win(t 0 ,t 1 ,∆) µ , for later time ranges. Indeed, we and other collaborations find that, for t 0 1.5 fm (which includes the 'W2' window), all of the higher-order correction schemes enumerated above (i.e., excluding NLO χPT) yield similar predictions for the finite-volume, pion-mass, and taste-breaking corrections to a win(t 0 ,t 1 ,∆) µ . Further, the estimates from these schemes for the sum of finite-volume, pion-mass, and taste-breaking corrections reasonably describe the observed differences between lattice data in this region. [29,33,79,87]. Therefore, they can be reliably used to calculate lattice corrections to a ll,W2 µ (conn.). In the intermediate-window region, the predicted corrections from the EFT-based schemes display a wider variation. The sizes of the finite-volume and pion-mass corrections to a ll,W µ (conn.), however, are numerically small (∆ W FV,Mπ 0.5%), and we incorporate the spread in a ll,W µ (conn.) results obtained with different correction schemes in our systematic error estimate in Sec. III E.   Table I Table I. above. For a ll,W µ (conn.) (top panel), the finite-volume corrections ∆ W FV are always less than 0.5%. There is, however, a significant spread between the different schemes. In particular, the finite-volume corrections obtained from the CM (green circles) are close to zero on all ensembles. This is because in the CM, the renormalized vacuum polarization function, Eq. (B3), is comprised of two terms: the first is identical to NLO χPT, while the second accounts for ρ-π-π interactions. For a ll,W µ (conn.), the latter contribution produces a correction opposite in sign to the former. In contrast, in χPT, the NLO (open blue triangles) and NNLO contributions to ∆ FV have the same sign, making the total NNLO corrections (purple downward triangles) larger. The spread between the finite-volume corrections in the top panel of Fig. 4 reflects the limitations of the correction schemes in the intermediate window region, as discussed earlier.
By design [33], χPT (and the other EFTs) should work better in the W2 region, for which contributions from low-lying ππ states are more important. Hence, we expect better consistency between the finite-volume corrections to a ll,W2 µ (conn.) from the different approaches. Indeed, these expectations are borne out in the bottom panel of Fig. 4, where the correc-tions from the higher-order schemes have a much-reduced (relative) spread compared to a ll,W µ (conn.). Additionally, the finite-volume corrections to a ll,W2 µ (conn.) are larger than for a ll,W µ (conn.) (∆ W2 FV ∼ 3% at the finest lattice spacing) due to the increased sensitivity to long-distance contributions at later Euclidean times.
Below a 0.12 fm, the size of finite-volume corrections to a ll,W µ (conn.) and a ll,W2 µ (conn.) decrease with increasing lattice spacing. This is because the pion taste splittings are larger on coarser lattices, and finite-volume corrections in systems with heavier masses are smaller. The finite-volume corrections at a ≈ 0.15 fm are generally larger than at a ≈ 0.12 fm, however, because the spatial volume of our coarsest ensemble is substantially smaller than the others (see Table VII).
In the absence of guidance from a direct finite-volume study, we take the range of finitevolume corrections for the schemes we consider here as an estimate of the associated systematic uncertainty. Some or all of the EFT-based models considered are of questionable reliability in the intermediate-window region. Motivated by this, we generate a second set of corrections to a ll,W µ (conn.) obtained from restricting the window to higher t, namely [0.7, 1.0] fm. The spread of these restricted corrections is ∼ 20% smaller than the full W window case. Therefore, in total we include ten sets of finite-volume-corrected data for each input a ll,W µ (conn.) data set in our analysis: two each for NLO χPT, NNLO χPT, CM, MLLGS, and HP. For the W2 region, the EFTs are on more solid theoretical footing and the higher-order schemes (NNLO χPT, CM, MLLGS, and HP) yield consistent results. Therefore, in our analysis we include four sets of finite-volume-corrected data for a ll,W2 µ (conn.), omitting NLO χPT because NNLO χPT should be more accurate in this region. These finite-volume-corrected data sets are inputs into the next step, and eventually feed into the BMA analysis of Sec. III E.

Pion-mass adjustment
We next consider the effects of pion-mass mistuning on a ll,W µ (conn.) and a ll,W2 µ (conn.) and estimate the pion-mass adjustments to these quantities, ∆ Mπ in Eq. (3.5), using a datadriven approach. As stated in Sec. II C, on our ensembles with a ≈ 0.15 and 0.12 fm, in addition to the unitary correlation functions listed in Table I, we have partially quenched correlators (and hence lattice data for a ll,W µ (conn.) and a ll,W2 µ (conn.)) with valence-quark masses bracketing the physical light quark (see Table II) 7 . Together with our unitary data at a ≈ 0.09 and 0.06 fm, this allows us to predict the size of pion-mass adjustments to a ll,W µ (conn.) and a ll,W2 µ (conn.) on all of our ensembles as follows. First, we correct our entire dataset for finite-volume effects as described in Sec. III C 1. We then fit the corrected a ll,W µ (conn.) and a ll,W2 µ (conn.) data to an interpolating function of the form The a ≈ 0.15 fm correlators were also employed in Ref. [39] to study strong-isospin-breaking effects in a ll µ (conn.  where Λ = 500 MeV (following Ref. [43]) and M π is the taste-Goldstone valence-sea pion mass, which is what enters the leading-order pion loops in χPT. The parametric dependence on M π in Eq. (3.7) is motivated by χPT, with an additional 1/M 2 π term accounting for the expected infrared-divergent behavior of a ll µ in the M π → 0 limit [48,88]. For each value of i in Eq. (3.7), we consider several values for n i ≥ 0, requiring only that the {n i } are the same for both a ll,W µ (conn.) and a ll,W2 µ (conn.), and that each fit has at least one degree-offreedom (d.o.f.). (Note that if n i = 0, then j = 0 and c i (a) is independent of aΛ.) Following Ref. [89], we account for correlations between the independent variables (a and M π ), as well as between the independent and dependent variables (a ll,W µ (conn.) and a ll,W2 µ (conn.)), using Bayesian priors. We monitor the χ simultaneously.
Once the coefficients c ij are determined for a given set of {n i }, we can use Eq. (3.7) to predict a ll,W µ (conn.) and a ll,W2 µ (conn.) at the target physical pion mass, M π,phys = M π 0 (see Sec. III B) for each ensemble. Figure 5 shows our central fits for a ll,W µ (conn.) (upper panel) and a ll,W2 µ (conn.) (lower panel). At each lattice spacing, we take the difference in a ll,win µ between the fit prediction at the physical-pion mass (blue dashed curve) and the unitary lattice data (filled squares) as our data-driven estimate of the pion-mass adjustment ∆ Mπ,DD . As seen in Fig. 5, our data-driven analysis finds that a correction to a ll,W µ (conn.) and a ll,W2 µ (conn.) of about 1 sigma is needed on the a ≈ 0.09 fm ensemble, for which the simulation pion mass is about 5% below the physical value.
The a ll,W µ (conn.) and a ll,W2 µ (conn.) data entering our central fits are corrected for finitevolume effects using the CM. Repeating this analysis using other finite-volume correction schemes yields almost identical predictions for the pion-mass adjustments. Replacing the 1/M 2 π term in Eq. (3.7) with log M 2 π also leads to negligible changes in the predicted values for ∆ Mπ . Figure 6 compares the pion-mass adjustments to a ll,W µ (conn.) (upper panel) and a ll,W2 µ (conn.) (lower panel) obtained in our data-driven analysis (filled green circles with error bars) and those estimated within three of the EFT-based correction schemes introduced in Sec. III C: the CM (empty green circles), NNLO χPT (empty purple upside down triangle), MLLGS (empty orange square triangles). On the three ensembles for which the pion mass is well tuned (a ≈ 0.06, 0.12, and 0.15 fm), the pion-mass adjustment ∆ Mπ,DD is negligible in all correction schemes. At a ≈ 0.09 fm, however, the picture is less clear. For a ll,W µ (conn.), the spread in model estimates is significantly larger than the error bar on the data-driven evaluation. For a ll,W2 µ (conn.), the models agree with each other, but differ from the data-driven prediction by ≈ 1.5σ.
In light of the differences between predicted corrections at a ≈ 0.09 fm, we adopt the following conservative procedure to obtain our final estimates for ∆ Mπ (black filled circles with error bars in Fig. 6). For the central value, we use the average of the data-driven and chiral-model predictions, i.e., ∆ Mπ ≈ (∆ Mπ,DD + ∆ Mπ,CM )/2. For the error on a ≈ 0.09 fm, we add (linearly) to the uncertainty on the data-driven prediction σ Mπ,DD an additional systematic uncertainty given by half the absolute difference between the data-driven and chiral-model predictions, i.e., σ Mπ,DD +|∆ Mπ,DD −∆ Mπ,CM |/2. On a ≈ 0.06, 0.12, and 0.15 fm we take the uncertainty to be just the uncertainty on the data-driven prediction. As shown in Fig. 6, our final estimates for the pion-mass adjustment at a ≈ 0.06, 0.12, and 0.15 fm are essentially those from our data-driven analysis. At a ≈ 0.09 fm, our final estimate for the pion-mass adjustment covers most (all) of the model spread for a ll,W µ (conn.) (a ll,W2 µ (conn.)).

Taste-breaking corrections
The final lattice correction, ∆ TB in Eq. (3.6), accounts for the mass differences at finite lattice spacing between staggered pions with different taste quantum numbers. For the HISQ action, these taste splittings arise from discretization effects of O(α 2 s a 2 ) and higher. It is well known, however, that the HISQ taste splittings do not scale linearly with α 2 s a 2 [44,62]. 8 As shown in Ref. [44], the HISQ pion-taste splittings decrease faster than naive expectations at lattice spacings below around 0.09 fm, while increasing more slowly at very coarse lattice spacings above roughly 0.12 fm. The former observation is likely due to the HISQ smearing [45] suppressing the leading α 2 s a 2 taste-breaking discretization contributions, making higher-order, e.g., O(α 3 s a 2 , a 4 ), effects more prominent, while the latter indicates the presence of additional higher-order terms. Figure 7 shows the taste-breaking corrections to a ll,W µ (conn.) (left panel) and a ll,W2 µ (conn.) (right panel) obtained within the χPT, CM, and MLLGS correction schemes introduced at the beginning of Sec. III C. Qualitatively, they display the same behavior as the taste splittings, with ∆ TB decreasing more rapidly at finer lattice spacings. Quantitatively, the estimated corrections span a wide range of values between 0 ∆ TB 30 × 10 −10 . This corresponds to corrections to a ll,W µ (conn.) and a ll,W2 µ (conn.) on our coarsest ensemble of up to ∼ 15% and ∼ 30%, respectively. Differences between correction schemes and sizes of the corrections vanish at zero lattice spacing by construction.
For a ll,W µ (conn.), the NLO χPT, CM, and MLLGS results are in broad agreement, while the NNLO χPT prediction is about 2-4 times larger for a 0.09 fm. In contrast, for a ll,W2 µ (conn.) the three higher-order schemes (NNLO χPT, CM, and MLLGS) predict similar corrections, while NLO χPT is the outlier. As with the finite-volume corrections (see Sec. III C 1), the spread of predicted taste-breaking corrections for a ll,W µ (conn.) is likely due to the correction schemes becoming less reliable at short distances. Indeed, this expectation is borne out in Figs. 8 and 9, which show the continuum extrapolations of a ll,W µ (conn.) and a ll,W2 µ (conn.), respectively, with and without taste-breaking corrections. For a ll,W µ (conn.), applying taste-breaking corrections increases the lattice-spacing dependence. In contrast, for a ll,W2 µ (conn.) the taste-breaking corrections computed in the three higher-order correction schemes all substantially reduce the lattice-spacing dependence, indicating that they capture the dominant discretization effects in this window.
Although the inclusion of taste-breaking corrections (and choice of scheme) will alter the lattice-spacing dependence of a ll,W µ (conn.) and a ll,W2 µ (conn.), it should not change the continuum-limit values. Consequently, varying the treatment of taste-breaking in the continuum extrapolation provides an additional measure of the continuum-extrapolation error. For the analysis of the intermediate window, we generate taste-breaking-corrected data sets with NLO χPT, NNLO χPT, CM, and MLLGS corrections for each input a ll,W µ (conn.) data set from the previous sections. We follow the reasoning of Sec. III C 1 and compute the corrections in two regions, the full intermediate window interval [0.4, 1] fm and the smaller interval of [0.7, 1] fm, resulting in a total of eight taste-breaking-corrected data sets for each input set. For each input a ll,W2 µ (conn.) data set we generate three sets of taste-breakingcorrected data, one each for NNLO χPT, CM, and MLLGS, dropping NLO χPT, as in Sec. III C 1. In both cases, we also keep the data sets uncorrected for taste-breaking effects. The corrected and uncorrected a ll,W µ (conn.) and a ll,W2 µ (conn.) data sets are then taken as inputs into the continuum limit extrapolations, and feed ultimately into the Bayesian model averaging analysis of Sec. III E.

D. Continuum extrapolation
To perform our continuum extrapolation we consider fit functions of the form: where F disc. (a) = C a 2 ,n (aΛ) 2 α n s + C a 4 (aΛ) 4 + C a 6 (aΛ) 6 (3.9) The function F disc. (a) describes discretization effects and F m ({δm f }) accounts for quark mass differences in the sea, where δm f is the difference between the physical and the simulation quark masses (see Secs. II B and II C). In F disc. (a) we include variations where the coefficient C a 6 is set to zero and where the power of α s in the a 2 term varies as n = 1, 2. For both variations of n = 1, 2, we label fit functions with C a 6 = 0 as "quadratic" and fit functions where all terms listed in F disc. (a) are included as "cubic". Following Ref. [43] we take Λ = 500 MeV and impose the Gaussian prior constraint C sea = 0.0(3). Here, the C sea term accounts for residual light sea-quark mass miss-tuning effects, remaining after performing the correction in Sec. III C 2, and also strange sea-quark miss-tuning effects. As in Ref. [43], the sea-quark masses in the ensembles employed here are so close to their physical values that our fits are insensitive to the F m ({δm f }) term and return posteriors for C sea with central values close to zero and uncertainties close to the initial prior width. This also means that higher-order terms involving δm f can be safely neglected. Additionally, to regulate the degrees of freedom in fits with an a 6 term, we constrain its coefficient with the Gaussian prior C a 6 = 0(2). (3.11) This prior width conservatively accommodates instances among the over two thousand continuum fits when the posterior central values are close to or slightly larger than unity. We also include fits to three ensembles, dropping the coarsest. In this case we include an additional prior constraint on the quadratic term C a 4 = 0(2) with the same reasoning as above for C a 6 . We include continuum-extrapolation fit variations, both with and without including the C sea term in our Bayesian averaging process. As illustration, in Fig. 8 we show results for quadratic continuum extrapolations of the a ll,W µ (conn.) data with n = 1 in Eq. (3.8). The four panels show finite-volume-corrected data computed from the CM (top left), NLO χPT (top right), NNLO χPT (bottom left) and MLLGS (bottom right). For each scheme, we compare continuum extrapolations of data with and without taste-breaking corrections, where we include fits to all four ensembles as well as fits to ensembles at only the three finest lattice spacings. For the data sets corrected with NLO χPT or the CM, we find very good agreement between the four continuum extrapolated results, whereas the NNLO χPT-and MLLGS-corrected data sets show larger spreads. Taking into account this variance, we find that the continuum results obtained with all four correction schemes are consistent with each other. The corresponding continuum extrapolations for a ll,W2 µ (conn.) are shown in Fig. 9. In this case we find good agreement between the continuum extrapolated results, both within each scheme as well as across the different schemes, albeit with larger uncertainties. We observe, however, that the NLO χPT taste-breaking corrections do a poor job at removing lattice spacing dependence compared to the higher-order schemes.
In summary, for each input data set, we perform twelve different continuum extrapolations, the results of which become inputs to the Bayesian model averaging analysis of Sec. III E: in Eq. (3.8), we take n = 1, 2 in the linear term, and include or don't include the C sea term 9 . With these four variations, we perform fits to the data at four lattice spacings with and without the cubic (a 6 ) term in Eq. (3.8), as well as with quadratic fits to data at the finest three lattice spacings.
Separately, as an independent analysis cross check of our continuum extrapolations and associated error estimate, we allow for higher-order terms in a ll,W µ (conn.) and a ll,W2 µ (conn.) using the empirical Bayes (or maximum marginal likelihood) approach described in Sec. 5.2 of Ref. [90]. For this analysis, the discretization term F disc. (a) in Eq. (3.8) takes the form: (3.12) The coefficients in Eq.

E. Bayesian model averaging
In order to quantify the systematic uncertainty due to the analysis choices described in the previous sections, we employ Bayesian model averaging (BMA) [46,47]. Summarizing these choices, we include variations of: • Observable extraction -Two methods are used to extract the uncorrected values of a ll,W µ (conn.) and a ll,W2 µ (conn.) from the correlation function data, as described in Sec. III B: -Raw correlation function data, C(t), integrated with the trapezoidal rule.
• Taste-breaking correction -We include χPT, CM, and MLLGS as well as a µ data sets which are not corrected for taste-breaking effects prior to continuum extrapolation.
• Correction region -For a ll,W µ (conn.), we include a variation on the corrections where they are computed from the range [0.7, 1] fm instead of over the full W window interval.
• Continuum fit -We perform continuum extrapolations using all 12 fit function variations described in Sec. III D including fits to the three finest ensembles.
In the context of BMA, a "model" M is defined as the set of analysis choices that yield a given result for the desired continuum, infinite-volume, physical observable from a single data set D. In our case, M is given by a set of choices from the options listed above, while D consists of the unmodified correlation function data. 10 In order to carry out the averaging, each M is assigned a probability weight given by This is the "Bayesian Akaike information criterion" (BAIC) as defined in [47]. Here, χ 2 data is the standard chi-squared function, not including the contribution of the priors, and a is the posterior mode (i.e., the best-fit point for the vector of fit parameters a when optimized against the augmented chi-squared function [90].) N cut is the number of data points cut from a data set-in this case, the number of ensembles omitted from a given extrapolation. The parameter k is the number of independent parameters in a given fit function. The factor pr(M ) is the prior probability of a given M ; we adopt a flat prior, so that this factor is a constant over all analysis variations and drops out of the model averaging results. The BMA mean and variance are then obtained from the following formulas: The first term on the right-hand side of Eq. (3.15) is a weighted average over the variances of the individual results. The second and third terms reflect the spread in results obtained with different analysis choices (in our case, correction schemes and fit functions). Because they encapsulate the systematic uncertainty due to analysis choices, we refer to their sum as the "model variance." In Fig. 10, we show the results of the Bayesian model average for a ll,W µ (conn.). The top-right panel illustrates the continuum extrapolations on two data sets, the first corrected with NNLO χPT and the second corrected with the CM, in both cases computed from the full W window interval. The dashed lines indicate the continuum extrapolations for each data set. In total, we include over two thousand separate fit results in the model average. The resulting distribution is shown in the top left panel of Fig. 10, where it is overlaid on the BMA result (red line and error band) obtained using Eqs. (3.14) and (3.15). The middle panel shows the results from the 24 best individual fits for each correction choice, ordered by the BAIC, in comparison to the BMA result, while the bottom panel gives the associated Q values [91] computed from χ 2 aug . We find that our best fits, as determined by the Q value, tend to have the smallest BAIC and hence largest model probability. We also note that the continuum results for data sets corrected for taste breaking using NNLO χPT tend to be smaller than those from the other variations and also return some of the largest model probabilities (points in middle panel with lower half purple).  [91]. In both panels, the correction schemes employed for ∆ FV and ∆ TB are indicated by the symbols' top and bottom colors, respectively. Figure 11 shows the analogous BMA result for a ll,W2 µ (conn.). Here we include 384 fit results, which is fewer than for a ll,W µ (conn.). This stems from the absence of NLO χPT corrections and from employing only a single correction region. The general features of this figure are the same as for Fig. 10. In the top left panel, we note that the BMA uncertainty for a ll,W2 µ (conn.) is larger than the spread of the histogram. This is because the bulk of the uncertainty in this case comes from the first term in Eq. (3.15) with relatively large statistical and scale-setting uncertainty contributions.
In order to better understand and test the model-averaging results, we also perform Bayesian model averages on specific subsets of the variations. That is, we fix one of the analysis choices but vary the rest as usual. The results of these subset averages for a ll,W µ (conn.) are shown in Fig. 12 (left). The top data point is our BMA result from Fig. 10. The two data points below it, show the BMA results for the two observable extraction choices described in Sec. III B. They are in excellent agreement with each other and with the full BMA result, signifying, as expected, that residual effects of oscillating contributions and of O(a 2 ) errors of the trapezoidal rule are negligibly small. The next five data points are the BMA results obtained from subsets with specific taste-breaking correction schemes. While these results are statistically consistent with the overall average, the differences in the central values contribute significantly to the systematic uncertainty through the latter two terms of Eq. (3.15) (outer uncertainty of the BMA result). In particular, as shown in Fig. 10, the fit results obtained from NNLO χPT corrected data tend to lie below the average. The following three data points are BMA results obtained from subsets of specific continuum-extrapolation fit functions, which agree well with each other and with the full BMA result. The last block of data points (below the dashed line) are BMA results from subsets that use the same schemes for finite-volume and taste-breaking corrections, where the top data point (BMA w/o mix) averages all four schemes (NLO χPT, NNLO χPT, CM, MLLGS), followed by results from the subsets corresponding to each single scheme, all of which are consistent with the full BMA result with small variations in central values.
The probability weights defined in Eq. (3.13) can be used to assess the relative weight of specific analysis choices in the BMA. Comparison of these weights can identify if one particular choice of observable extraction method, correction scheme or fit-function variation is preferred by the averaging procedure. More specifically, letting S denote a subset of the full space of models {M }. We can define the "subset probability" of S by the relative  where "M i ∈ NNLO" denotes the subset of models (i.e., analysis choices) in which NNLO χPT is used for both corrections. Using this definition, we show the relative probabilities of the subsets considered above as pie charts in Fig. 12 (right). From the top pie chart, for the two methods of observable extraction, we find roughly equal contributions to the overall BMA result, indicating no preference by the BMA procedure. The second pie-chart from the top shows the subset probabilities for specific taste-breaking corrections. The probability of the subset in which the data are not corrected prior to continuum extrapolation is smaller by slightly more than a factor of two compared with the other subsets. This is because the taste-breaking corrections are computed in two window regions, [0.4, 1] and [0.7, 1] fm in addition to the continuum fits to data without taste-breaking corrections having larger χ 2 values, indicating a preference for data corrected for taste-breaking. The third pie chart shows that quadratic continuum fits to the full set of four ensembles are preferred over cubic fits or fits to just three ensembles. In the case of the fits to three ensembles the smaller subset probability can be traced back to the penalty incurred, N cut , in Eq. (3.13) due to dropping a data point. For subsets in which the same correction scheme is used for finite (conn.) obtained from the BMA analysis with an empirical Bayes approach. Both analyses use data sets corrected in the CM scheme. The third and fifth columns show a µ results obtained in the empirical Bayes approach from fits to data sets without and with first correcting for taste splittings, respectively. The fourth and sixth columns list the effective scales Q eff obtained by maximizing the Bayes Factor log(GBF).

BMA: CM
Empirical Bayes volume and taste splittings (bottom pie chart) we find a slight preference for NNLO χPT and slight disinclination for NLO χPT. Figure 13 shows the BMA subsets for a ll,W2 µ (conn.), with results similar to those for a ll,W µ (conn.). As expected, there is greater consistency among the subset averages from specific correction schemes compared to the a ll,W µ (conn.) case, with the largest variation in central value coming from continuum extrapolations to data not corrected for taste-breaking effects. The pie charts in the right panel of Fig. 13 reveal roughly equal subset probabilities in each case, except for the third (from the top) pie chart, which illustrates that here too quadratic continuum fits to all four ensembles are preferred for the same reasons as above.
The AIC criterion used in Ref. [29] differs from Eq. (3.13) in that the weight assigned to cutting data points is given as N cut instead of 2N cut . In order to test the robustness of the model-averaging procedure, we repeat the analysis by replacing 2N cut with N cut in Eq. (3.13). We find that this yields central values and uncertainties on the final results are essentially the same as before, with at most minor changes to the weights in the third pie-chart from the top in Figs. 12 and 13. This result is not unexpected, because in our case, N cut ≤ 1, and only a small fraction of the total variations in our averages have N cut = 0.
In order to cross check our main continuum-limit extrapolations and the subsequent BMA analysis, we use an empirical Bayes approach to perform independent continuumlimit extrapolations (see Eq. (3.12)). In the comparison of the two approaches, we use the a µ data sets obtained from integrating the correlation functions with Trapezoidal rule and corrected using the CM scheme, with and without first correcting for taste splittings. When performing continuum extrapolations using all the terms in Eq. (3.12), we observe that most of the posterior coefficients are small and consistent with zero -only the linear ∼ c 11 a 2 α s and quadratic ∼ c 20 a 4 terms in Eq. (3.12) are needed to describe the data. This observation is consistent with our main continuum-extrapolation analysis, described by Eqs. (3.8) and (3.9). Table III shows the comparison of the BMA analysis (restricted to the same CM-corrected data sets) with the empirical Bayes fits for both a ll,W µ (conn.) and a ll,W2 µ (conn.). We find good agreement in central values and error bars, after considering the spread between the empirical Bayes results from data sets with and without first correcting for taste splittings.

F. Results and error budgets
Our results for the light-quark-connected contributions to a W µ and a W2 where the errors are those obtained from the BMA procedure described in the previous section, and include both statistical and systematic uncertainties. Although Bayesian model averaging provides a robust estimate of the total uncertainties in our results, the construction of detailed error budgets from the BMA is not straightforward. We start from the expression for the BMA variance in Eq. (3.15). The first term on the right-hand side is linear in the variances, and hence can be trivially separated into individual contributions from Monte Carlo statistics and each of the parametric inputs w 0 , ∆ Mπ , and Z V . For example, the statistical uncertainty is given by where we average over all analysis variations using the probability weights of Eq. (3.13). Repeating this procedure for all the above-mentioned contributions yields the error estimates in Table IV in the rows marked "Monte Carlo statistics", "Scale setting", "Pion-mass adjustment" and "Current renormalization". The second and third terms in Eq. (3.15) depend solely and non-linearly on the central value of each variation, with the latter term including pairwise differences between all possible model pairs in the full BMA result. This makes it impossible to strictly disentangle the contribution from only a subset of model variations, (e.g., finite-volume corrections or treatment of discretization effects). We can obtain an approximate error budget, however, as follows. First, to estimate the systematic uncertainty associated with the finite-volume correction, we perform subset model averages separately for each finite-volume correction scheme. These results are shown in Fig. 14. Taking the variance in central values of these results yields the "finite-volume correction" error in Table IV. Next, we subtract (in quadrature) the so-estimated finite-volume error from the total model variance. The remaining uncertainty is associated with variations in the treatment of oscillating states in C(t), the taste-breaking corrections, and the continuum-extrapolation fit function. Combining this uncertainty (in quadrature) with those on the fit-function coefficient posteriors yields the "continuum extrapolation" error in Table IV. Table IV presents the approximate error budgets for a ll,W µ (conn.) and a ll,W2 µ (conn.) obtained from the above approach. For a ll,W µ (conn.), the largest error is from the continuum extrapolation, and is driven by the spread in results using different taste-breaking correction schemes. Here we note that the consistency between quadratic and cubic continuum extrapolations (as illustrated in Figs. 12 and 13) as well as between our main results and those from the empirical Bayes approach (see Table III) indicate that systematic errors due residual higher-order discretization effects are well encompassed by our uncertainties. Next is the parametric uncertainty from the gradient-flow scale, which is about 30% smaller. Errors from Monte-Carlo statistics, finite-volume corrections, and current renormalization are also non-neglible, and are roughly commensurate. For a ll,W2 µ (conn.), Monte-Carlo statistics are by far the largest source of uncertainty. Following that, the contributions from scale setting, the continuum extrapolation, and the pion-mass adjustment, which are ∼ 50-60% smaller. Although finite-volume and current-renormalization errors are negligible compared with these other uncertainties, they will be important for calculations of a HVP,LO µ aiming for 0.5% precision.

IV. SUMMARY AND OUTLOOK
In Fig. 15, we compare our intermediate-window result, Eq. (3.18), with other lattice-QCD calculations of this quantity [29, 32, 33, 36-38, 71, 92, 93], which were obtained using different lattice actions and analysis methods. Of the results to date, ours has the smallest statistical uncertainty, 0.19%. Ours is also the first result for a ll,W µ (conn.) obtained from  [38], ETMC 22 [37], Mainz/CLS 22 [36], Aubin et al. 22 [33], χQCD 22 [92], BMW 21 [29] and Lehner & Meyer 20 [93]. Results by Aubin et al. 19 [71] and RBC/UKQCD 18 [32], shown in grey, are superseded by Aubin et al. 22 and RBC/UKQCD 23, respectively. The inner error bar shown for our result is from Monte Carlo statistics. a blind analysis. While some form of EFT-inspired correction schemes were employed in every calculation, our analysis is the first to include all of them. Because we incorporate uncertainties due to analysis choices via Bayesian model averaging [46,47], our systematic error estimate is robust without being overly conservative.
In Fig. 16, we compare our result for the "W2" window observable, Eq. (3.19), with the only other available lattice-QCD result for this quantity [33]. Although the results appear consistent, they are not wholly independent because the analysis in Ref. [33] is based on some of the same ensembles as employed in this work. Statistical and systematic correlations due to the shared configurations must be taken into account to make a quantitative comparison. Other independent lattice-QCD calculations of a ll,W2 µ (conn.) would provide welcome consistency checks.
Before our results for a ll,W µ (conn.) and a ll,W2 µ (conn.) can be directly compared with datadriven determinations, the contributions from heavier flavors must be added as well as those from quark-line disconnected contractions and isospin-breaking corrections (QED and m u = m d ). The s-, c-, and b-quark-connected contributions to a HVP,LO µ have already been computed on the HISQ ensembles with high precision [94][95][96]; windowing these results will be straightforward. The remaining contributions are being computed in ongoing projects; see Refs. [40][41][42]50].
Looking at the big picture, the observed consistency between so many different, largely independent, results for the light-quark connected contribution to the intermediate-window observable (see Fig. 15) indicates that the systematic errors in lattice-QCD calculations of this quantity are under reasonable control. It is therefore unlikely that the differences between the lattice-QCD calculations reported in Refs. [29,[36][37][38] and the data-driven result of Ref. [35] will be resolved by further improvements in lattice-QCD calculations of a ll,W µ (conn.). Lattice-QCD calculations of the quark-connected contributions from heavier flavors are also unlikely causes of the difference, since their uncertainties are smaller by an order of magnitude [29,95,96]. The quark-disconnected and isospin-breaking contributions to a HVP,LO µ , however, have been computed by only a few collaborations [29,32,38,39,87,97]. 11 Although these contributions are too small to change a W µ substantially, additional independent lattice-QCD calculations are needed to solidify the central value and uncertainty in order to better quantify the significance of the difference.
In Ref. [34], we pointed out that other windowed observables can provide more stringent comparisons between lattice-QCD and data-driven results right now. Because intermediatewindow observables cut out low-t contributions to a HVP,LO µ where lattice-QCD statistical errors are smallest, "one-sided windows" without a lower bound on the Euclidean time can capture a larger fraction of the total a HVP,LO µ while retaining controlled uncertainties. We are currently repeating the analysis of Ref. [34] using the larger data set employed in this work.
The light-quark connected contribution to the intermediate-window observable represents only around a third of the total leading-order HVP contribution to the muon's anomalous magnetic moment. Thus, the work presented in this paper is only a part of a multi-year project to compute a HVP,LO µ with 0.5% precision. Several of our ongoing efforts aim to reduce the dominant sources of uncertainty in our published result for the total light-quark connected contribution to a HVP,LO µ [43]; these will also improve our determinations of the intermediate-window observables in this work. For example, recently we introduced a "lowmode-improved" method into our analysis that substantially reduces statistical errors at large Euclidean times [98]. The uncertainty on the scale-setting quantity w 0 is an important source of uncertainty not only for all a HVP,LO µ observables, but also for many other analyses based on the MILC HISQ ensembles. We are therefore working to compute precisely the Ω-baryon mass on these ensembles [99], as well as the relative scale w 0 /a, and plan to use the results to determine the scale in physical units with reduced uncertainty.
With these ongoing efforts, we expect to obtain a HVP,LO µ with sub-percent-level precision in the near future. In order to further reduce the precision to match that of the Fermi-lab [1] and JPARC [24,25] experiments, however, it seems likely that considerable exascale computing resources will be needed. In particular, the inclusion of MILC's physical-mass HISQ ensemble with a ≈ 0.042 fm would enable more robust continuum extrapolations of all a HVP,LO µ observables and provide better control over this important source of systematic error. A direct finite-volume study is needed to better quantify the finite-volume corrections and reduce the corresponding uncertainty. This would require the generation and analysis of new ensembles with different spatial volumes and all other parameters held fixed. Finally, further control over long-distance effects and statistical noise could be achieved by computing directly the two-pion contributions to the vector current correlation functions [49,100,101].

ACKNOWLEDGMENTS
We thank Claude Bernard, Urs Heller, Jack Laiho, Bob Sugar, and Doug Toussaint for their scientific leadership and collaboration. In particular, we are grateful to Bob for his tireless efforts to obtain computational resources, to Claude for guidance on chiral perturbation theory, to Doug for his invaluable expertise in creating so many of our gauge-field ensembles, and to Jack and Urs for essential contributions to previous projects that formed the basis for this work. We thank Anthony Grebe for his contributions to the chiral perturbation theory codes used in this work. We thank Maarten Golterman for useful comments and suggestions. Computations for this work were carried out in part with resources provided by the USQCD Collaboration, the National Energy Research Scientific Computing Center    . Shown are the integrands obtained with raw correlation-function data C(t) (blue circles), the reconstruction from the fit including oscillating states C fit (t) (purple), without oscillating states C no osc. (t) (orange), improved parity averaged correlator C IPA (t), (Eq. (A4)) (green), and interpolated correlator C interp (t) (Eq. (A5)) (red). (Right) Lattice-spacing dependence of a ll,W µ (conn.) (top right) and a ll,W2 µ (conn.) (bottom right) data obtained from the correlation functions modified with the oscillation removal techniques discussed. All data sets are corrected for finite-volume effects using the Chiral Model and pion-mass mistuning effects using the data-driven approach, described in Secs. III C 1 and III C 2. The data points are slightly displaced horizontally for clarity. A linear fit function (see Sec. III D) is used to fit the a ll,W µ (conn.) and a ll,W2 µ (conn.) data at the three finest lattice spacings.

Appendix A: Cross-checks of window determinations from staggered correlation functions
In this section, we detail the methods used to obtain the windowed a HVP,LO µ from the staggered correlation function, C(t). First, we compare three different approaches for treating the oscillating contribution to the correlator. The first method, which we use in our main analysis, is to fit C(t) over the region of interest and reconstruct it from the fit posteriors excluding the oscillatory contribution. The correlation function has the spectral representation, where the sum is over all possible contributing states. We use this expression to craft a model fit function that separates the oscillating and nonoscillating contributions. For this purpose, we truncate the sum: For simplicity, we keep the same number N states of regular and oscillating states. We restrict the fit range [t min , t max ] to cover the window region of interest. The use of t min justifies the truncation to a finite N states by suppressing contributions from states with large energy.
On the other hand, the use of a fixed t max carries a risk that the lowest-lying energies and amplitudes may not be accurately resolved with finite statistical precision. However, since we are simply using the expression as a useful model for removing the unwanted oscillations, it is not critical that our estimates of all energy levels are asymptotically correct for t max → ∞. For the energies and amplitudes of the light-quark-connected correlator, we take the Gaussian priors associated with the local (unsmeared) data in Eqs. (A3) and (A4) of Ref. [48]. We then have the corresponding fit reconstruction of the correlation function without the oscillating contribution Results for a ll,W µ (conn.) computed from the fit reconstruction on the 0.09 fm ensemble are shown in Fig. 17 (top). Here, for the fit range, we fix t max = 1.3 fm (t 1 + 2∆) and vary t min . We also fit up to six states with good stability obtained at four, which we take to be our value for N states on all ensembles. The second panel of Fig. 17 shows the ground state energies obtained from these fits; shown also is the ground state energy obtained from a fit to the full correlation function (blue band). We see a significant difference in these energies, perhaps because the full fit picks up some mixture of the hard-to-determine two-pion states in the large-time region. Nonetheless, we observe in Fig. 18 (as described below) that C fit accurately reconstructs the correlation function data in the window region of interest. For a ll,W µ (conn.) and a ll,W2 µ (conn.) we take t min and t max to be 2∆ beyond the t 0 and t 1 boundaries in the corresponding window definition. For the coarsest two ensembles, this would correspond to a t min /a = 0, 1. To avoid possible staggered-operator complications at small t/a, we take t min /a = 2 for those two ensembles. As a test of the fidelity of this method, we show results for the correlated differences of a ll,W µ (conn.) and a ll,W2 µ (conn.) computed from the fit reconstruction with the oscillating states, C fit (t), and the original correlation function in Table V. One can see tiny differences on the coarsest ensembles for a ll,W µ (conn.), likely due to the restriction of not using the first two time-slices; however, these differences are well within the uncertainties of the results for a ll,W µ (conn.). The second method we examined is improved parity averaging (IPA) as employed in Ref. [93] for computing a HVP,LO µ , a modification of the method developed in Ref. [102]. Here,  (73) the correlation function is replaced by the following equation: The exponent used is the PDG value of the ρ meson mass [103], to give the best cancellation in the ρ resonance peak which dominates in the regions of W and W2. This approach introduces additional discretization effects; however, one expects a consistent continuum limit as the oscillations become small at finer lattice spacing. The final approach, originally used in Ref. [104], is performed by interpolating the evenand odd-site correlation functions separately, then averaging the two interpolations to obtain a new correlation function where the oscillating contribution has been removed.
C interp (t) = 1 2 C even. interp (t) + C odd. interp (t) (A5) We use a cubic-spline interpolation with the Steffen algorithm implemented in the gvar Python package [105] to interpolate the correlation functions. In Fig. 18 (top left), we compare the a ll,W µ (conn.) integrand on the 0.09 fm ensemble obtained from the raw correlator data (blue circles), C fit (t) (purple line), C no osc. (t) (orange line), C IPA (t) (green line), and C interp (t) (red line). We find that the C fit (t) integrand is in excellent agreement with the raw data in the region of interest, suggesting that C no osc. (t) is an accurate representation of the correlation function without the oscillating contribution. However, we see some differences between the C IPA (t) and C interp (t) integrands and the C no osc. (t) integrand, especially at shorter times where a large number of excited states contribute. Figure 18 (top right) examines the lattice spacing dependence of the a ll,W µ (conn.) data obtained with all of the different oscillation removal techniques. In the case of a ll,W µ (conn.) data obtained from C no osc. (t) we see only small deviations compared to the a ll,W µ (conn.) from the raw data which are more significant at coarser lattice spacing. As a result, the continuum extrapolations (which use a simple linear fit in a 2 α s (2/a) to the three finest ensembles, leaving out the 0.15 fm data point) of the two data sets are in excellent agreement. While the IPA method does yield a consistent result in the continuum limit, it exibits much larger discretization effects. The interpolation method modifies the lattice spacing dependence so significantly that a linear fit is not enough to describe the observed behavior. This is likely due to the interpolation scheme not capturing the high energy state contributions sufficiently. In Fig. 18 (bottom), we compare these methods applied to a ll,W2 µ (conn.); here the different methods give nearly identical results because the oscillations are less pronounced and fewer excited states contribute significantly. In order to quantify the effects of the oscillations in a ll,W µ (conn.) and a ll,W2 µ (conn.), we use the fit approach, our preferred method of removing them, in Table VI, where we compare results and correlated differences obtained using the trapezoidal rule (see Sec. III B) for the raw correlation function data vs. C no osc. (t). For a ll,W µ (conn.), we find the differences to be small but statistically significant on the coarsest two ensembles and statistically zero on the finer ones. For a ll,W2 µ (conn.), we find the differences to be zero on all ensembles, which is expected because the oscillating contributions are from heavier states which contribute significantly less in the long time region.
Finally, we examine the truncation effects associated with the trapezoidal rule by comparing a µ observables computed from it to results obtained with Simpson's rule. Simpson's rule cannot be applied to the raw correlation function data because of the presence of oscillatory contributions. Hence, the comparisons in Table VII employ the C no osc. (t) correlation functions. Here, the differences are within errors on a ll,W µ (conn.) and a ll,W2 µ (conn.) and decrease much faster than a 2 .
In summary, truncation effects from numerical integration and discretization effects due to the oscillatory contributions are clearly well-controlled and small compared to other systematic effects. To make certain that any systematic error due to integration and removal of oscillatory states is included, we include variations on both numerical integration and removal of oscillatory contributions in our main analysis, as described in Sec. III E.
Finally, the windowed HVP can be computed in the chiral model via [29] Π Q 2 →Π win.
whereΠ is given by Eq. (B3) and W is the Fourier transform of the window function defined in Eq. (2.7).