Bayesian Monte Carlo extraction of sea asymmetry with SeaQuest and STAR data

We perform a global QCD analysis of unpolarized parton distributions within a Bayesian Monte Carlo framework, including the new $W$-lepton production data from the STAR Collaboration at RHIC and Drell-Yan di-muon data from the SeaQuest experiment at Fermilab. We assess the impact of these two new measurements on the light antiquark sea in the proton, and the $\bar{d}-\bar{u}$ asymmetry in particular. The SeaQuest data are found to significantly reduce the uncertainty on the $\bar{d}/\bar{u}$ ratio at large parton momentum fractions $x$, strongly favoring an enhanced $\bar{d}$ sea up to $x \approx 0.4$, in general agreement with nonperturbative calculations based on chiral symmetry breaking in QCD.


I. INTRODUCTION
Over the past five decades, there has been great interest in the structure of the nucleon sea, which encompasses quarks, antiquarks, and gluons that exist beyond the three valence quarks originally proposed as the building blocks of the nucleon [1][2][3]. In particular, high-energy scattering experiments and global QCD analyses of the data have now conclusively demonstrated a large difference in the momentum distributions ofū andd antiquarks in the proton [4]. The result cannot be explained perturbatively through the splitting of gluons into quark-antiquark pairs [5], and requires nonperturbative mechanisms, such as dynamical chiral symmetry breaking and the pion cloud of the nucleon [6][7][8][9][10][11][12][13][14][15][16], or dynamics related to the Fermi-Dirac statistics of quarks and the Pauli exclusion principle [17][18][19][20][21][22]. More recently, exploratory studies have been made in extracting information on the isovector sea quark distributions directly from lattice QCD calculations [23][24][25][26][27].
The first experimental indications of a light-quark sea asymmetry came from the CFS group at Fermilab in 1981 [28]. Measuring the Drell-Yan process, where a quark and antiquark from colliding hadrons annihilate into a virtual photon that subsequently decays into a lepton-antilepton pair, they found that thed distribution, integrated over parton momentum fraction x, was larger than the integratedū distribution. The first highprecision experimental evidence for a sea asymmetry came a decade later from the New Muon Collaboration (NMC) at CERN [29,30], which used measurements of inclusive deep-inelastic scattering (DIS) on hydrogen and deuterium to test the Gottfried sum rule [31] and determine that the integral 1 0 dx d (x) −ū(x) must be positive. Further evidence was provided by the NA51 Collaboration at CERN [32] using the Drell-Yan process, and indications for a nonzero asymmetry were also found by the HERMES Collaboration [33] in semi-inclusive DIS of charged pions.
The most conclusive evidence for an excess ofd overū was provided in 1998 by the Fermilab E866 (NuSea) experiment [34][35][36], which measured Drell-Yan leptonpair production cross sections in proton-proton (pp) and proton-deuteron (pD) scattering. At kinematics where the parton momentum fraction of the beam, x 1 , is much greater than that of the target, x 2 , one finds that the ratio of pD to pp cross section, at leading order in the strong coupling, provides direct sensitivity to thed/ū ratio [37], .
An alternative method for extracting thed−ū asymmetry involves W -lepton production in hadronic collisions, whereby a quark and antiquark annihilate into a W boson that decays into a detected lepton and a neutrino. This has been measured in pp scattering at the Tevatron and in pp collisions at the Large Hadon Collider (LHC) and the Relativistic Heavy Ion Collider (RHIC). Taking the ratio of the W + and W − cross sections for pp reactions, at leading order in perturbative QCD one has The most recent high-precision data from the STAR Collaboration at RHIC [39]  these data are sensitive to parton distributions at higher values of x, which can potentially provide information on the distributions of light sea quarks in this difficult to measure region.
In this paper, we present the results of a global QCD analysis using the JAM Bayesian Monte Carlo framework, including the new measurements from SeaQuest [38] and STAR [39]. By combining these data with other hadron collider data from Fermilab and the LHC, and with DIS data from fixed-target and collider experiments, we are able to place strong constraints on the light-quark PDFs and assess the impact of the new data on the antiquark asymmetryd −ū.
In Sec. II we begin by reviewing the theoretical framework used in this analysis, including a summary of the most relevant factorization formulas. In Sec. III we review the methodology employed for the analysis, as well as the choice of parametrization for the parton distribution functions (PDFs). The quality of the fit is described in Sec. IV, where we compare the fitted cross sections with the new Drell-Yan and W -lepton data. The results for the PDFs are presented in Sec. V, where we in particular assess the impact of the various datasets on the shape of thed −ū asymmetry and its uncertainties. We also compare our extracted asymmetry with nonperturbative calculations based on chiral symmetry breaking in QCD. Finally, in Sec. VI we summarize our findings and discuss their implications for our understanding of the sea content of the nucleon.

II. THEORETICAL FRAMEWORK
Our theoretical framework is based on fixed order collinear factorization for various high-energy scattering processes involving spin-averaged PDFs, such as inclusive DIS, Drell-Yan lepton-pair production, and weak boson and jet production. Since the new datasets that provide sensitivity to the antiquark distributions in this analysis involve Drell-Yan and weak boson production, we will focus on these observables in this section.
The double differential Drell-Yan lepton-pair production cross section can be written in terms of convolutions of the PDFs in the colliding hadrons with short-distance partonic cross sectionsσ DY ab [40], where α is the electromagnetic coupling, S is the invariant mass squared of the reaction, and µ R and µ F are the renormalization and factorization scales, respectively. We write the cross section as differential in the invariant mass of the lepton pair, M 2 , and the rapidity of the lepton pair in the center-of-mass frame, Y . The sum over the quark flavors a, b runs over all partonic channels that can contribute to the Drell-Yan process, for which the scale is set to µ R = µ F = M . The partonic cross sectionsσ DY ab are computed at next-to-leading order (NLO) in the strong coupling α s (µ R ), with the NLO expressions taken from Ref. [40]. For pp scattering, the PDFs f a,b are those in the proton. For proton-deuteron scattering, the x of the parton in the deuteron target is small enough that it is reasonable to approximate the pD cross section by a simple sum of proton and neutron cross sections [41], σ pD ≈ σ pp + σ pn , with the PDFs in the neutron related to those in the proton through isospin symmetry.
For W -lepton production, the double differential cross section is given by [42] which is differential in the outgoing lepton's pseudorapidity, η , and its transverse momentum, p T . Labeling the momenta of the incoming hadrons by P 1 and P 2 , we define the Mandelstam invariants T ≡ (P 1 − p ) 2 and U ≡ (P 2 − p ) 2 , which then allows us to introduce the variables V ≡ 1 + T /S and W ≡ −U/(S + T ). Defining v and w as the partonic analogues of V and W , we can write for the parton momentum fractions . The sum over the quark flavors a, b runs over all partonic channels that can contribute to W -lepton production, for which the renormalization and factorization scales are chosen to be the mass of the W boson, µ R = µ F = M W . The partonic cross sectionŝ σ W ab are again computed at NLO in the strong coupling α s (µ R ), with the NLO expressions used here taken from Ref. [42].
In practice, the W -lepton production cross sections are usually integrated over a specified range of p T (for example, p T > 25 GeV for the STAR data), and the cross section is given as differential in the pseudorapidity η . Note that the theoretical formalism that leads to Eq. (4) above applies to final states that are fully inclusive in the transverse momentum of the undetected neutrino, p ν T . This formalism therefore cannot be used directly to calculate the W -lepton asymmetries from the D0, CDF, and AT-LAS collaborations, which place cuts on p ν T and would require extending the current theoretical framework to implement such phase space cuts [43].
To avoid this ambiguity, we use the fully inclusive reconstructed W -boson data from D0 and CDF instead of the W -lepton data. Our analysis does not include data from ATLAS, but includes the fully inclusive data from the CMS collaboration, which cover the same kinematic region with similar uncertainties. We can thus retain most of the information provided by these observables, despite the exclusion of the non-fully inclusive W -lepton data from CDF, D0, and ATLAS.
For the DIS theory, we include corrections that are known to be necessary to accurately describe the highx region. These include target mass corrections, as described by Moffat et al. [44], parameterized higher twist corrections, and nuclear corrections for deuterium data. Further details regarding these corrections will be discussed in a future publication [45].
The scale dependence of the PDFs is determined according to the DGLAP evolution equations [46][47][48], with the PDFs and α s evolved according to their renormalization group equation (RGE) at next-to-leading logarithmic accuracy with the boundary condition α s (M Z ) = 0.118. For light and for heavy quarks, the PDFs are evolved using the zero-mass variable-flavor-number scheme. The values of the heavy quark mass thresholds for the evolution of the PDFs and α S are taken from the PDG values m c = 1.28 GeV and m b = 4.18 GeV in the MS scheme [49].

III. BAYESIAN INFERENCE
Our PDF extraction procedure is based on Bayesian inferences using Monte Carlo techniques developed in previous JAM analyses [50][51][52][53][54]. We parametrize the PDFs at the input scale µ 2 0 = m 2 c using a generic template function of the form where a = {N, α, β, γ, η} is the set of parameters to be inferred, and M = B[α + 2, β + 1] + γB[α + 5 2 , β + 1] + ηB[α + 3, β + 1] normalizes the function to the second moment to maximally decorrelate the normalization and shape parameters. To characterize the nucleon valence region and discriminate it from the sea components, we parametrize the light-quark and strange PDFs according to where the dependence on x and the scale µ 2 0 has been suppressed for convenience. The input quark distributions u v , d v ,ū 0 ,d 0 , s 0 , ands 0 , as well as the gluon distribution g, are parametrized individually as in Eq. (5). For the sea quark PDFs, the additional functions S 1 and S 2 are also parametrized via Eq. (5), and are designed to allow a more singular small-x behavior compared to the valence distributions by restricting the corresponding α parameter to more negative values. Letting S 1 = S 2 allows for different small-x behaviors for the light sea quarks and the strange quarks.
The normalization parameter N for the gluon distribution is fixed by the momentum sum rule, while the corresponding normalization parameters for u v , d v , and s 0 are fixed by the valence number sum rules. For the u v , d v , g,ū 0 , andd 0 distributions, the parameters γ and η in Eq. (5) are included to allow sufficient flexibility. We have verified that including these parameters for S 1 , S 2 , s 0 , ors 0 does not lead to significant changes to the final results, so for these distributions the γ and η parameters are set to zero.
Our Bayesian analysis consists of sampling the posterior distribution given by with a likelihood function of Gaussian form, and a flat prior function π(a) that vanishes in regions where the parameters a give unphysical PDFs. The χ 2 function in (8) is defined as where d i,e is the experimental data point i from dataset e, and T i,e is the corresponding theoretical value. All uncorrelated uncertainties are added in quadrature and labeled by α i,e , while β k i,e represents the k-th source of point-topoint correlated systematic uncertainties for the i-th data point weighted by r k e . The latter are optimized per values of the parameters a via ∂χ 2 /∂r k e = 0. We include normalization parameters N e for each dataset e as part of the posterior distribution per data set, with a Gaussian penalty controlled by the experimentally quoted normalization uncertainties δN e .
The posterior distribution is sampled via data resampling, whereby multiple maximum likelihood optimizations are carried out by adding Gaussian noise with width α i,e to each data point across all data sets. The resulting ensemble of parameter samples {a k ; k = 1, . . . , n} is then used to obtain statistical estimators for a given observable (function of PDFs) O(a), such as the mean and variance, The agreement between data and theory is assessed by defining the "reduced" χ 2 for each dataset e as with N e dat the total number of data points for each experiment, and E[...] represents the mean theory as defined in Eq. (10).

IV. QUALITY OF FIT
The datasets used in this analysis comprise inclusive DIS from hydrogen and deuterium, Drell-Yan lepton-pair production in pp and pD scattering, along with weak vector boson and jet production in pp and pp collisions. The various datasets are summarized in Table I, and their Born-level kinematics are displayed in Fig. 1. For DIS, we include F 2 structure function data from fixed target experiments from BCDMS [55], NMC [56,57], SLAC [58], and Jefferson Lab [59,60], as well as the reduced neutral and charged current proton cross sections from the combined H1/ZEUS analysis from HERA [61]. The cuts on the four-momentum transfer squared Q 2 and the hadronic final state masses W are Q 2 > m 2 c and W 2 > 3 GeV 2 for all DIS data.
The quality of the global fit is summarized in Table I, which shows a global average χ 2 red = 1.12 for a total of N dat = 4421 data points. The Drell-Yan data on the ratio σ DY pD /2σ DY pp from both the NuSea and SeaQuest experiments are shown in Fig. 2. Note that the data from the two experiments need not overlap, as they were taken at different kinematics with different values of M , and for the ranges 0.36 < x 1 < 0.56 for NuSea [36] and 0.48 < x 1 < 0.69 for SeaQuest [38]. As suggested in Ref. [38] and seen in Fig. 2 (in combination with Eq. (1)), there is some tension between the new SeaQuest data Drell-Yan cross section ratio σ DY pD /2σ DY pp from SeaQuest [38] (red circles) and NuSea [36] (blue triangles) compared with the JAM results at their respective kinematics (red and blue 1σ uncertainty bands), as a function of the target momentum fraction x2, with the corresponding x1 ranges indicated. The ratio of data to the average theory is illustrated in the lower panel with 1σ theoretical uncertainties at the SeaQuest and NuSea kinematics. and the earlier NuSea data, particularly above x = 0.25, where NuSea suggestsd/ū < 1 while SeaQuest indicates the opposite. The amount of tension can be illustrated by removing the SeaQuest data, which then leads to an improvement in the χ 2 red for NuSea from 1.30 to 0.57. While the tension exists, we are nonetheless able to attain an acceptable description (within 2σ theoretical uncertainty) of both datasets, as seen in Table I and Fig. 2.
For the new W -lepton data from STAR, shown in Fig. 3, the description suffers slightly at high and at low pseudorapidities, leading to a χ 2 red of 2.02 for these data. From Ref. [39], it is known that this is a common feature of most PDF extractions. This discrepancy is also partially due to some tension with the NuSea data, and the χ 2 red improves to 1.54 upon its removal.

V. EXTRACTED PARTON DENSITIES
Our analysis is based on more than 900 Monte Carlo samples, which we use to ensure the statistical convergence of the PDFs, from which the means and expectation values are then computed using Eqs. (10). The resulting parton densities are displayed in Fig. 4, where we show the valence (u v and d v ), gluon (g), light anti- In the valence sector, our results for both u v and d v are slightly larger than those from the other groups near the peaks of the distributions [75][76][77][78]. At lower x, the u v distribution agrees with the other groups, while the d v PDF agrees best with the CT18 [78] and NNPDF3.1 [75] parametrizations. For the gluon PDF, our results are largely in agreement with other extractions, although at low x there are some differences with the ABMP16 [76] fit. The same is true for the sum of light sea quark distributions,d +ū, but the disagreements are with CJ15 [77]. For the strange distribution s +s, our result is somewhat suppressed at low x relative to the other extractions. This is consistent with previous JAM analyses [53,54] that included semi-inclusive DIS cross section data, although these data are not included in the present analysis. Note that this analysis does not include Z boson production data from ATLAS, which were found [79,80] to enhance the strange quark distribution at low x. Furthermore, we do not include a parametrization for the nonperturbative charm, but generate it entirely from radiation, which may also play a role in the shape of the strange quark PDF [75].
The impact on the light-quark sea asymmetry from both the new STAR measurement of W -lepton cross sections and the SeaQuest measurement of Drell-Yan dimuon production is shown in Fig. 5. From the baseline analysis, which excludes these new datasets, the STAR cross section ratios are added first in order to assess their impact. While the STAR data do not lead to significant shifts in the central values ofd/ū, they do reduce the uncertainties somewhat, by up to ≈ 20% at x ≈ 0.2. This modest impact can be understood from the fact that the NuSea Drell-Yan measurements are more directly sensitive tod/ū (cf. Eqs. (1) and (2)), and already provide the bulk of the constraints on the ratio even when compared to high precision W -lepton and reconstructed W data from the Tevatron and LHC. Since the STAR data overlap kinematically with the NuSea measurement, it is therefore difficult to improve on the extraction ofd/ū using W -lepton production alone.
After adding the STAR data to the baseline, the SeaQuest Drell-Yan cross section ratios are then included. In this case, the SeaQuest data greatly reduce the uncertainties on thed/ū ratio, by up to ≈ 50% at high x, x 0.3. As well as reducing the uncertainty, the addition of the SeaQuest data also increases thed/ū ratio for x ≈ 0.2, where it remains above unity up to values of x ≈ 0.4. This is a direct consequence of the extended x range of the data compared with the earlier NuSea results, from x ≈ 0.3 up to x ≈ 0.4, with higher precision at the largest x values. This feature is also reflected in thed −ū difference remaining positive across the entire range of x probed.
With the new data from STAR and SeaQuest included, the finald/ū ratio and x(d −ū) difference are shown in Fig. 6, compared with the corresponding distributions from several other groups [75][76][77][78] at Q 2 = 10 GeV 2 (see also Refs. [81,82]). Our results are in agreement, within errors, with those from ABMP16 [76] and CT18 [78] (except with ABMP16 at low x 0.04), whose ratios also remain positive. Although there are differences in the shapes of the ratios, our results also largely agree within errors with those from NNPDF3.1 [75] and CJ15 [77]. The disagreement with CJ15 at high x results from their choice of parametrization that forcesd/ū → 1 as x → 1, and, more importantly, the fact that this fit predates the SeaQuest data, which pull the ratio upwards at large x.
It is instructive to examine the impact of individual datasets on the light-quark sea asymmetry, x(d −ū), which we illustrate in the left panel of Fig. 7. Starting with inclusive DIS data only, and excluding data from the NMC experiment, the asymmetry is consistent with zero within large uncertainties. Upon the inclusion of the NMC data [56,57], the errors are significantly reduced, and the asymmetry gives an indication of deviation from zero in the range 0.01 < x < 0.2. When W -lepton, reconstructed W and Z boson, and jet production data from RHIC, Tevatron, and the LHC are included (but not the new STAR data [39]), the asymmetry becomes significantly larger, and more distinguishable from zero below x = 0.3. The new constraints come primarily from the high precision W and W -lepton asym- metry measurements from the Tevatron and LHC, which are sensitive toū andd (see Eq. (2)). The further addition of the NuSea [36] Drell-Yan data greatly decreases the uncertainty, showing that these data still provide a strong constraint on the asymmetry even when compared to the Tevatron and LHC W and W -lepton asymmetries. Finally, the inclusion of the new SeaQuest [38] and STAR [39] data reduces the uncertainty on the asymmetry even further, while increasing the magnitude at x 0.2, as already seen in Fig. 5 except now displayed on a logarithmic scale.
The impact of the various datasets on the antiquark asymmetry can also be represented in the form of the truncated moment, 1 0.01 dx(d −ū), illustrated in Fig. 7 in the form of the normalized yield of the Monte Carlo replicas. We choose x = 0.01 for the lower limit as this is approximately the extent to which existing data provide information on the asymmetry (see Fig. 1). Note that because of large uncertainties associated with the small-x extrapolation, estimates of the total moment are FIG. 6. Comparison of the JAMd/ū and x(d −ū) PDFs (red bands) with the NLO parametrizations from NNPDF3.1 [75] (gold), ABMP16 [76] (blue), CJ15 [77] (gray), and CT18 [78] (green) at the scale Q 2 = 10 GeV 2 . All bands represent 1σ uncertainty.
not as meaningful without additional constraints on the x → 0 behavior. For the same combinations of datasets as described above, one observes that prior to the addition of the NMC data the truncated moments of the replicas can vary widely between −0.2 and +0.15. Once the NMC data are added, the moments become almost entirely positive, and the yield continues to contract as more data are added. Once all of the data are included, the truncated moments are tightly gathered around 0.1.
The shape and magnitude of thed −ū asymmetry has long been an intriguing puzzle for our understanding of the nonperturbative structure of the nucleon. The most common interpretation of the excess ofd overū in the proton sea has been that associated with chiral symmetry breaking, and the consequent prevalence of the virtual p → nπ + dissociation [6]. Scattering from the π + component of the proton wave function then naturally enhances thed distribution, even though some of this will be cancelled by the subdominant p → ∆ ++ π − dissociation, which favorsū overd. As an illustration, in Fig. 8 we compare the inferred JAM asymmetry with d −ū at x > 0 calculated from a convolution of the p → baryon + π splitting functions and the valence PDF of  [55,[58][59][60][61] excluding NMC (gold band). Then data are added successively, with NMC [56,57] (gray); W -lepton and W asymmetries/Z boson production/jet production from RHIC [74], Tevatron [62-65, 72, 73], and the LHC [66][67][68][69][70][71] (green); NuSea [34,36] (blue); and finally the SeaQuest Drell-Yan [38] and STAR W -lepton ratio [39] (red). For the same combinations of data, the normalized yield of the Monte Carlo replicas for the truncated moment the pion [84][85][86], (12) where the convolution integral is defined as f ⊗ q = 1 0 dy 1 0 dzf (y) q(z) δ(x−yz), and y is the light-cone fraction of the proton's momentum carried by the pion.
For the calculation shown in Fig. 8, the pion PDF is taken from the recent NLO analysis of pion-induced Drell-Yan and deep-inelastic leading neutron electroproduction data by Barry et al. [83]. The splitting functions are computed at one-pion loop order from chiral effective theory [84][85][86], using several different models for the ultraviolet regulators [87,88]. For the regulator mass parameters, we use the values from the global analysis in Ref. [87], for which the integrated splitting function was found to be n πN = 0.22. Normalizing the various models of the regulator function, or hadronic form factor, to this value, the resulting band in Fig. 8 can be taken as a representation of the uncertainty on the calculated asymmetry. The uncertainty also includes the errors from the Monte Carlo analysis, although these are small compared to the variations in the models.

VI. SUMMARY AND OUTLOOK
Our global QCD analysis confirms the importance of the recent SeaQuest measurement of pp and pD Drell-FIG. 8. Comparison of the extracted JAM x(d −ū) distribution (red band) at Q 2 = 10 GeV 2 with results from nonperturbative calculations based on chiral symmetry breaking and the pion cloud [84,85]. The JAM band represents 1σ uncertainty, while the pion cloud band includes model dependence associated with the shape of the ultraviolet regulator function [83,87,88].
Yan cross sections [38]. The inclusion of the SeaQuest data reduces the uncertainty on thed/ū ratio by up to 50% at high values of x, and strongly suggests that thē d−ū asymmetry remains positive up to x ≈ 0.4. The impact from the latest W + /W − STAR data [39] on thed/ū ratio, while still important, is less dramatic due to the lower sensitivity of W -lepton production data compared with the Drell-Yan data.
Although the new SeaQuest data indicate some tension with the earlier NuSea Drell-Yan measurement [35,36] at large x, a good simultaneous description of both datasets is still possible due to the larger relative uncertainties of the NuSea data at high x. The shape and magnitude of d −ū from the global analysis is consistent with expectations from nonperturbative models, such as those based on chiral symmetry breaking in QCD, all of which predict a positive asymmetry up to x ∼ 0.4 − 0.5. In the future, combining this analysis with semi-inclusive DIS data [54] could provide further constraints on the light-quark sea asymmetry in the proton.