1 Introduction

1.1 Light-parton structure of the nucleon in electroweak precision measurements

Fig. 1
figure 1

Hessian correlations [1,2,3] for the values of \(\sin \theta _W\) extracted from Z boson production at the LHC 8 TeV [4]. Left: correlations with valence PDFs and PDF ratios at \(Q=81.45\) GeV, plotted as a function of x for CT14 NNLO PDFs. Right: the same, for correlations with PDFs of individual flavors

Electroweak precision tests of the Standard Model (SM) at hadron colliders are nontrivially sensitive to the parton flavor composition of initial-state hadrons. For spin-independent inclusive observables at the CERN Large Hadron Collider (LHC) – or, indeed, at any high-enough energy facility such as the Relativistic Heavy Ion Collider, the Jefferson Lab CEBAF accelerator, or the future Electron-Ion Collider – this flavor composition is typically specified by helicity-averaged parton distribution functions (PDFs) of the proton. The PDFs, f(xQ), have long been of strong interest from the perspective of both fundamental Quantum Chromo-Dynamics (QCD) as well as particle phenomenology, given that they quantify the probability of resolving a quark or gluon constituent of flavor f carrying a fraction x of the proton’s longitudinal momentum in a scattering process with a squared energy scale \(Q^2 > rsim 1~\text {GeV}^2\). For this reason, the PDFs play a central role in predicting cross sections for pp collisions at the LHC, and, in particular, their accuracy influences the ability of LHC measurements or other high-energy data to constrain the SM parameters, including in the electroweak sector.

Due to the challenge of reducing their uncertainties and empirically distinguishing among their parton flavors, PDFs have historically been determined most robustly through “global QCD fits” [5,6,7,8], now increasingly performed at next-to-next-to-leading order (NNLO) accuracy in \(\alpha _s\), and drawing upon large collections of experimental measurements sensitive to QCD and different underlying PDF combinations. In spite of the growing number of LHC measurements, deeply-inelastic scattering (DIS) experiments involving fixed hadronic or nuclear targets at BCDMS, NMC, SLAC, and JLab continue to provide key information to disentangle the PDFs in recent global QCD analyses such as CJ15 [9], ABMP16 [10], CT18 [11], NNPDF3.1 [12], and MSHT20 [13]. The fixed-target experiments complement analogous DIS collisions at the HERA ep collider by extending the momentum fraction coverage to larger x values and adding measurements on deuterium targets. In fact, such experiments provide the leading constraints on the (anti)quark PDFs at low scales Q and large momentum fractions \(x > rsim 0.05\), as well as on the gluon PDF by observing scaling violations over the same kinematic region [14,15,16].

Fig. 2
figure 2

Lagrange Multiplier scans on \(d_{val}(x=0.03,Q=85\, \text {GeV})\) (left) and \(u_{val}(x=0.03,Q=85\, \text {GeV})\) (right), showing the changes in the \(\chi ^2\) for all data sets and most sensitive experimental data sets in the CT18Z NNLO global QCD analysis [11]

In precision tests of the electroweak sector, the substantial PDF dependence of the involved theoretical calculations affects experimental determinations of SM parameters, such as the weak-mixing angle \(\theta _W\) extracted from the \(A_{{\mathrm {FB}}}\) forward-backward asymmetry measured in the production of Z bosons during Runs 1 and 2 of the LHC. Figure 1 illustrates typical Hessian correlations [1,2,3] of PDF combinations (left) and PDFs (right) with the \(\sin \theta _W\) values extracted from 8 TeV \(A_{{\mathrm {FB}}}\) measurements at the LHC. Here, the correlations are computed using the preliminary (unpublished) ATLAS Run-1 data [4] on \(\sin \theta _W\) extracted with individual PDF eigenvector sets of the CT14 NNLO ensemble [17].

In the left subfigure, we see that the values of the extracted \(\sin \theta _W\) are strongly correlated with the valence combinations of light-quark PDFs at \(Q=81.45\) GeV, \(d_{{ val }}(x,Q)\equiv d(x,Q)-\bar{d}(x,Q)\) at \(x\approx 0.008-0.05\) and \(u_{{ val }}(x,Q) \equiv u(x,Q)-\bar{u}(x,Q)\) at \(x\approx 0.008-0.1\). In addition, significant correlations with the extracted \(\sin \theta _W\) exist at higher \(x > rsim 0.3\) as well, especially for the PDF ratio d/u, and again for \(d_{{ val }}\). Remarkably, the correlations are weaker with the PDFs of individual parton flavors (shown at right) than with the valence combinations. An anti-correlation with the d-quark (green dashed line) at \(x\sim 0.3\), affecting \(A_{{\mathrm {FB}}}\) at smaller x via the valence-quark sum rule, is evident in this case, though not exceptionally strong.

The sizable correlations between fitted PDFs and \(\sin \theta _W\) in Fig. 1 are consistent with the significant PDF uncertainties on these and similar BSM-sensitive quantities, including the W boson mass, \(M_W\), and Higgs cross section, \(\sigma _H\). For this reason, the realization of next-generation precision in the determination of these electroweak parameters is critically dependent on the reduction of their associated PDF uncertainties, including the high-x uncertainty of the d-quark and gluon (g) PDFs, as well as of \(d_{{ val }}\) and d/u.

We might ask where the experimental constraints on the relevant PDF combinations for LHC electroweak precision tests arise from. While direct measurements at the LHC will supply increasing information on the PDFs affecting \(A_{{\mathrm {FB}}}\) [18] and other observables [19], recent CTEQ studies [9, 11, 15, 16, 20] find that deep-inelastic scattering experiments on nuclear targets will continue to provide strong constraints on the down-quark PDFs in the nucleon in the near future. In a global QCD analysis, experimental measurements made solely on proton targets are at present insufficient for full separation of parton distributions for d, s, g, and anti-quark flavors. Assuming parton-level charge symmetry, \(d^p(x,Q)\approx u^n(x,Q)\), between the PDFs in the proton p and the neutron n, and correcting for low-energy nuclear effects [14], one can then use scattering processes on the deuteron or heavier nuclei to constrain the down-type PDFs in the proton.

We illustrate the importance of fixed-target data in the determination of the weak mixing angle with the help of Lagrange Multiple (LM) scans [21] in Fig. 2, in which we examine the dependence of the figure-of-merit function \(\chi ^2\) in the CT18Z NNLO analysis [11] on the values of the valence \(u_{val}\) (left) and \(d_{val}\) (right) quarks at \(x=0.03\) and \(Q=85 \) GeV. We plot the change in \(\chi ^2\), as compared to the value in the best fit, for all data sets (labeled as “Total” in the figure) and for individual experiments with the highest sensitivity to this PDF combination. The curves for the experimental data sets are labeled according to the convention in Table 1. The LM scans show that a small group of DIS experiments – NMC ratio of d and p DIS cross sections [22], inclusive HERA I+II DIS [23], BCDMS p and d reduced cross sections [24,25,26,27], CCFR \(F_3\) structure function [28] – contribute the largest variations in \(\varDelta \chi ^2\) when the valence PDFs are varied, together with several lepton pair production experiments by ATLAS, LHCb, E605, and E866. In the case of \(d_{val}\) in the right Fig. 2, the BCDMS measurements on p and d show somewhat different preferences, with the deuteron data clearly preferring a higher \(d_{val}(0.03,85~\text {GeV})\). In these and other cases, the deuteron DIS measurements, with their large numbers of precise data points, have large statistical significance in the global fit. As this LM scan also illustrates, the proton and deuteron data sets in the CT18 fit sometimes prefer somewhat different values of \(d_{val}(x,Q)\), which in turn may hamper the efforts for reducing the PDF uncertainty in the relevant EW precision measurements.

Table 1 A comprehensive listing of experiments included within the CT and CJ frameworks for this study, grouped according to the experimental process they represent. For each experiment, we give the number of experimental data points, \(N_{ pt }\), included into the CT and CJ fits presented in this study, with blank entries indicating that the given experiment is omitted from the respective global fit

In this article, we will employ a statistical technique called the \(L_2\) sensitivity [16] that can be viewed as a fast approximation to the LM scans that are usually very computing-intensive. With this technique, we will survey agreement between the constraints on the PDFs from the deuteron and other data sets in a wide range of x and for various treatments of deuteron corrections. While we have investigated sensitivities for various PDF flavors, our presentation will focus on the sensitivities to u, d, and g PDFs that are most affected.

Apart from its significance for the LHC and SM phenomenology, the physics of large-x quark PDFs is interesting in its own right. Nonperturbative QCD approaches [5, 29, 30] and lattice QCD [31, 32] provide increasingly complete predictions for the flavor composition of unpolarized protons at \(x\rightarrow 1\), which in turn inform one on the role of color confinement in the binding of the valence quarks. These predictions can then be confronted with phenomenological determinations of Mellin moments and PDFs at large x. For example, Ref. [33] observes that the proton (BCDMS \(F_2^p\), E866 pp DY, HERAI+II ep DIS) and deuteron (BCDMS \(F_2^d\), NMC p/d ratio) experiments in the CT18 NNLO analysis have somewhat different preferences for the effective exponents \(\beta \) controlling the \((1-x)^\beta \) falloff of the valence up and down quark PDFs at \(x\rightarrow 1\). In turn, these differences impact comparisons of phenomenological PDFs against large-x predictions from quark counting rules [34,35,36] and other nonperturbative approaches [29, 30]. The analysis methods utilized in this paper can shed light on these issues.

1.2 The role of nuclear-medium effects

Extracting parton-level information from nuclear data sets involving the deuteron or heavier targets requires an understanding of the effects of the nuclear environment [8, 14]. The trivial dependence on the nuclear atomic number A and charge Z is normally implemented by constructing a nuclear PDF as a linear combination of the PDFs on free protons and neutrons, as reviewed, e.g., in Sec. 5. A of Ref. [7]. On top of this trivial (AZ) dependence, low-energy interactions in the nuclear medium may modify the quark and gluon distributions relatively to those in free nucleons. The nuclear corrections that account for these deviations can be computed with increasing sophistication and connection to the formal theory describing low-energy dynamics. One can, for example, utilize phenomenological, data-driven ratios to convert nuclear-target cross sections to free-nucleon ones [11, 37]; parametrize and fit the nuclear deformation of the PDFs either inside the deuteron in a nucleon PDF fit  [13, 38] or in a heavy nucleus in a nuclear PDF fit [39,40,41,42,43,44]; or, finally, adopt a dynamical model of the low-energy nucleon-nucleon interactions and calculate the hard cross-section as a double convolution of parton distributions inside the nucleons and nucleon wave functions inside the nuclear targets [9, 45, 46]. The resulting extractions of the nucleon PDFs from nuclear data then have a dependence on the assumed nuclear corrections.

The specific methods used typically differ when analyzing light nuclear targets (e.g., the deuteron) or heavy nuclei (e.g., Fe). Here, we concentrate on and summarize the techniques utilized in the CT18 and CJ15 fits which form the starting point for the study presented in this paper, and then briefly mention other approaches:

  1. 1.

    Deuteron corrections. A dynamic deuteron correction in the CJ15 next-to-leading order (NLO) PDF fit [9] was applied to any process involving interaction with a deuterium target, including both DIS and Drell–Yan experiments, as detailed at greater length in Sect. 2.1. This correction allows the CJ15 NLO fit to include the fixed-target DIS data from SLAC and JLab at the largest x not admitted by other groups. The correction can be understood as arising from several dynamical effects, such as the relativistic Fermi motion of bound nucleons, binding corrections, and nucleon off-shellness effects. In practice these mechanisms are taken into account via convolutions of free-hadron cross sections with nuclear smearing functions calculated starting from bound nucleon wave-functions. Nuclear correction mostly affects the intermediate and large regions of x.

    The CJ15 analysis also applies a phenomenological parametrization for the off-shell deformation of the scattered nucleon’s structure function (in short, “off-shell corrections”) with parameters fitted to data to increase the model flexibility. Care is taken that the valence quark number inside the nucleon is not modified; since the off-shell function is flavor independent and has no significant dependence on \(Q^2\), it must then change the sign in the interval of [0, 1], meaning that it is essentially given by a polynomial with one or more roots in this range.

    An analysis similar in spirit to the CJ15 fit is available from the AKP group [45], and differences in the extracted off-shell functions are currently being investigated by the two groups jointly. Alternatively, the deuteron correction can be fitted using a purely phenomenological parametrization as in the MSHT20 analysis [13], or the additional uncertainty associated with the deuteron effects can be learned from the global analysis data themselves, as done in the NNPDF study [37]. Other groups do not include deuteron corrections by selecting the fitted data in a \(\{x,Q\}\) region where the deuteron corrections are small compared to the precision of data, as is done, e.g., in the CT18 analysis [11].

  2. 2.

    Heavy-nucleus effects. Nucleon PDF fits may include DIS experiments performed on heavy nuclear targets, such as CCFR [47] and NuTeV [48], involving \(^{56}\)Fe, and CHORUS [49], with \(^{82}\)Pb (which is not presently fitted in either CT or CJ packages studied here). It has been known empirically for some time that the structure functions of these heavier nuclear targets exhibit x- and A-dependent deviations from the structure function of the physical deuteron, owing to a variety of physical processes characterizing the nuclear medium [50,51,52,53], including the heavy-nucleus analogue of the EMC and Fermi-motion effects discussed for the deuteron at high x, and nuclear (anti)shadowing phenomena at lower x.

    To address these effects, the CT group corrects DIS cross sections on iron (CCFR [47], CDHSW [54], and NuTeV [48]) and proton-copper Drell–Yan (E605 [55]) to the corresponding cross sections on deuterium using a phenomenological parametrization of the nuclear-to-deuteron cross section ratios based on results in [51] (see also [56], Fig. 2a), which depends on x but not on \(Q^2\) in the fitted region. To include the heavy-nuclear data in the MSHT20 [13] and earlier MMHT fits, a nuclear correction factor, \(R_f\), [38] having the form \(f^A(x,Q^2) = R_f(x,Q^2,A)f(x,Q^2)\), where \(f^A(x,Q^2)\) is defined to be the PDF of a proton bound in a nucleus of mass number A, was determined. This was assessed using the de Florian et al. nuclear PDF (nPDF) of Ref. [41], then weighted by a 3-parameter modification factor as in Ref. [38], which is actively refitted along with the PDF-associated parameters. NNPDF [57] determines the uncertainty due to heavy-nuclear effects using a similar statistical procedure as for the deuteron.

As can be seen from this summary, global analyses vary in their treatments of the nuclear effects, but with the frequent conclusion that the resulting differences are marginal in comparison to contemporary PDF uncertainties. Indeed, the experiments in question are fitted reasonably well, while the higher-order QCD and parametrization uncertainties on the PDFs are comparable for the most part to the nuclear ones at NLO or even NNLO. In the present study, however, we identify several areas where the deuteron corrections will play a prominent role in the near future, as the field advances toward higher accuracy in the determination of nucleon PDFs. We compare, in particular, the effects of deuteron corrections in two independent PDF global fits by the CTEQ-JLab (CJ) and CTEQ-TEA (CT) groups which differ in their phenomenological focus, data selection and, crucially, the treatment of scattering process in nuclear targets. We find that the deuteron effects will have pronounced consequences for both the fitted PDFs in the large-x region and the correlations among the PDFs and quantities derived from them in an extended x range.

More specifically, this paper will elucidate constraints on the d-quark, gluon, and other PDFs arising in the CT18 and CJ15 global fits. We will accomplish this by analyzing the \(L_2\) sensitivity to various PDFs [16], a simple informative figure of merit that allows us to look inside the CJ and CT fits and understand the constraints from the fitted experiments on various parton flavors in an expansive region of x and Q.

1.3 Paper organization

After this introduction, the remainder of the article is as follows. In Sect. 2, we briefly present the deuteron-structure corrections (Sect. 2.1) with which this investigation is primarily concerned, as well as power-suppressed QCD effects, also relevant to fits involving nuclear data and/or at lower Q and W (in Sect. 2.2). In Sect. 3, we summarize the relevant features of the CT and CJ fitting frameworks and the special modifications we made in the two analyses to allow their direct comparisons. In Sect. 3.3, we review the \(L_2\) sensitivity method. In Sect. 4, we apply this method to analyze the impact of deuteron-structure corrections on the fit results, and examine the patterns of PDF pulls obtained in the several iterations of CT/CJ under different assumed treatments of the deuteron corrections. As representative cases, we will concentrate on the d/u ratio in Sects. 4.2 and 4.5, and on the gluon in Sects. 4.4 and 4.6. In Sect. 4.3 we also explore the implications of our analysis to precision measurements of the weak mixing angle. The conclusions in Sect. 5 are followed by a technical appendix describing a numerical procedure to reconcile the error analyses in the CJ and CT approaches.

2 Low-energy QCD effects

2.1 Deuteron-structure effects

The critical low-energy effect considered in this study, which arises due to MeV-scale dynamics characterizing nuclear bound states, is the modification of the parton-level substructure of nucleons embedded in the nuclear medium – in particular, those effects arising in the most weakly bound nuclear system, the two-body deuteron (Fig. 3).

Fig. 3
figure 3

We plot the nuclear correction ratio, \(F^N_2/F^d_2\), calculated using the central CJ15 fit results for several selections of the \(Q^2\) scale. Each of the four panels above highlights a given value of \(Q^2\), while graying out the curves for other scales in order to retain visual information on the scale dependence of the correction factor at large x. In the upper two panels, which focus on lower scales, \(Q^2=5, 10\) GeV\(^2\), the dotted lines indicate the range of x that is only accessible to CJ (\(W^2>3\) GeV\(^2\)) but not CT (\(W^2>12.25\) GeV\(^2\)), due to the more conservative cut of the latter

In the CJ framework, these corrections are treated as nuclear wave-function effects, and the deuteron parton distributions \(f^d\) are calculated as a convolution of the bound nucleon’s parton distributions, \(\widetilde{f}^N\), with a suitable nucleonic “smearing function,” \({\mathcal {S}}^{N/d}\):

$$\begin{aligned} f^d(x,Q^2)= \int \frac{dz}{z}\, \int dp^2_N {\mathcal {S}}^{N/d}(z,p^2_N) \, \widetilde{f}^{N}(x/z,p^2_N,Q^2). \end{aligned}$$
(1)

Here, z represents the momentum fraction of the (isoscalar) nucleon within the deuteron, defined as \(z \equiv (M_d/M_N) (p_N \cdot q / p_d \cdot q)\); \(p_{d,N}\) are the deuteron and nucleon four-momenta; and \(M_{d,N}\) are their respective on-shell masses. This representation is founded on the so-called Weak Binding Approximation (WBA) to the calculation of nuclear structure functions [46, 58], where the \({\mathcal {S}}^{N/d}\) smearing function is calculable based on an assumed nuclear potential; as in Ref. [9], we assume the AV18 potential. Since \(p_N\) is generically off-shell for a bound nucleon, but typically by only a small amount, one can further expand the bound-nucleon PDF, \(\widetilde{f}^N\), in powers of its off-shellness, \(\omega = (p_N^2-M_N^2)/M_N^2\), as

$$\begin{aligned} \widetilde{f}^{q/N}(y,p^2_N,Q^2)&= f^N(y,Q^2) + \frac{p_N^2-M_N^2}{M_N^2} \, \delta f^N(y,Q^2) \nonumber \\&\quad + O(\omega ^2). \end{aligned}$$
(2)

The first term, corresponding to \(p_N^2=M_N^2\), gives the PDF of the free, on-shell nucleon. In the second term, the \({\mathcal {O}}(\omega )\) coefficient (also known as “off-shell function”) can be phenomenologically parametrized and determined in a global fit from the interplay of data involving deuterium targets and information involving free-nucleon-based observables like W boson production at the Tevatron, the Relativistic Heavy Ion Collider (RHIC) or the LHC. Like in Ref. [9], we assume the flavor-independent 3-parameter shape function

$$\begin{aligned} \delta f^N (x) = C (x-x_0)(x-x_1)(1+x_0-x)\ , \end{aligned}$$
(3)

with \(x_1\) fixed by requiring the off-shell PDFs to satisfy the quark-number sum rule. Further technical details and a discussion of the fit results can be found in Ref. [9].

Section 4 considers three main scenarios for implementing the deuteron corrections (d.c.) discussed above:

  1. (i)

    an uncorrected scenario for which no nuclear effects are included for the deuteron;

  2. (ii)

    a fixed scenario in which the nuclear wave-function effects (on- and off-shell) are frozen to the AV18-informed choice of Ref. [9], and the off-shellness correction, \(\delta f^N(x)\), is set to the CJ15 central fit; and

  3. (iii)

    a free scenario particular to CJ, in which the parameters in Eq. (3) for the off-shell nucleon are allowed to vary.

The dynamical deuteron corrections are natively implemented in the CJ framework, and the off-shell parameters can be simultaneously fitted with the PDFs. So far, however, the CT code only supports deuteron corrections given in the form of analytic interpolations, such as the one obtained from the correction in [59]. To implement the fixed CJ15 deuteron correction in the CT framework and render it more directly comparable to CJ with respect to its treatment of deuteron target data, we instead multiply the experimental DIS deuteron structure function by the \(F_2^N/F_2^D\) nucleon-to-deuteron ratio plotted in Fig. 3:

$$\begin{aligned} F_{2}^N \equiv F_{2,\text {exp}}^d \left( \frac{F_2^N}{F_2^d} \right) _\text {CJ15}, \end{aligned}$$
(4)

with \(F_2^N = F_2^p+F_2^n\). The effective isoscalar combination of proton and neutron structure functions thus defined can then be directly compared to uncorrected theoretical calculations of the isoscalar deuteron DIS structure function. On this logic, the CT and CJ fits with a fixed CJ15 correction are placed on similar theoretical footing regarding the implementation of the deuteron effects, with the main difference being whether the correction is imposed within the theoretical structure function calculation or in the \(F^d_2\) experimental data – a fact which is immaterial for the sake of evaluating the \(\chi ^2\)-function and allows us to compare the impact of the same fixed correction on the CJ and CT frameworks. While a full analysis of the nuclear correction uncertainties is outside the scope of this article, the effect of letting the nuclear off-shell parameters free to vary in the present analysis can be appreciated by comparing the CJ fits in the fixed and free nuclear corrections scenarios.

Fig. 4
figure 4

Kinematics of the DIS data included in the fits discussed in this paper. The HERA DIS collider data were taken on proton targets; the fixed-target SLAC, JLab, BCDMS and NMC experiments include both proton and deuterium target data at approximately the same kinematics. The \(W^2=12.25\) GeV\(^2\) and \(W^2=3\) GeV\(^2\) cuts adopted, respectively, by the CT and CJ fits are shown by dashed and dot-dashed lines, respectively. The figure is taken from Ref. [60]

The size and x dependence of the deuteron corrections, as quantified by the isoscalar nucleon-to-deuteron structure function ratio \(F^N_2/F^d_2\), are shown in Fig. 3 for several representative choices of \(Q^2\). One immediately notices that deuteron corrections depend on the DIS scale and, at large x, increase with \(Q^2\) toward a fixed point in the \(Q^2\! \rightarrow \! \infty \) Bjorken limit; as such, deuteron corrections become effectively scale independent for \(Q^2 > rsim 50\) GeV\(^2\). For each plotted value of \(Q^2\), the figure also indicates the maximum x values below which data are accepted in the CJ and CT fits according to their \(W^2> 3 \text{ and } W^2>12.25\) GeV\(^2\) kinematic cuts, respectively. For CJ, which extends the analyzed DIS data set to the low-\(Q^2\) and large-x values as shown in Fig. 4, it is imperative to correctly account for the \(Q^2\) dependence of the deuteron correction in order to avoid conflicts with the leading-twist logarithmic \(Q^2\) evolution that constrains the fitted gluon distribution in DIS experiments. For CT, with its larger \(W^2\) cut, the deuteron corrections are small and nearly scale independent, as seen in Fig. 4, except for the less precise BCDMS deuteron points with \(x > rsim 0.6\) (see the kinematical map in Fig. 3), where some influence from the deuteron correction is expected and indeed quantified in Sect. 4.

2.2 Power-suppressed effects

Due to their less conservative kinematical restrictions on \(Q^2\) and \(W^2\), the CJ global fits extend into a region for which power-suppressed corrections are non-negligible, as depicted in Fig. 4. On the one hand, dynamical higher-twist corrections of \({\mathcal {O}}(\varLambda ^2/Q^2)\) emerge because of the presence of multi-parton correlations within the soft portion of the factorized DIS process, for which the first subleading contribution to the twist expansion for unpolarized scattering are matrix elements of twist-4 operators [61, 62]. As in CJ, these are often determined phenomenologically using forms like \(F_2(x,Q^2) = F_2^\text {LT}(x,Q^2) \left[ 1 + C(x)\big /Q^2 \right] \), where \(F_2^\text {LT}\) represents the leading-twist structure function, and a fitted coefficient, \(C(x)=\alpha x^\beta (1+\gamma x)\), parametrizes the power-suppressed twist-4 corrections. On the other hand, target-mass corrections of \({\mathcal {O}}(M^2_N/Q^2)\) are due to the non-negligible mass, \(M_N\), of the struck nucleon, and are implemented via the operator product expansion of Georgi and Politzer [63, 64] or related prescriptions, as extensively reviewed in [65, 66]. Both corrections are natively implemented in the CJ framework.

In contrast, CT imposes more restrictive kinematical cuts in \(W^2\), such that the standard CT data sets lie beyond the region for which the finite \(\sim \! 1/Q^2\) corrections are significant. In the past CT studies it mattered little whether the deuteron correction was included according to a specific model or not included at all (the default choice). While we expect some interplay between the deuteron and power-suppressed effects, we do not systematically isolate the latter and leave their investigation to future studies.

3 Methods

3.1 Selections of experimental data sets

The CT18 and CJ15 global data sets consist of both high-energy measurements as well as data down to the few-GeV region. Despite their somewhat differing phenomenological emphases – with CT generally aimed toward high-energy processes, and CJ toward the low-Q and large-x region probed at facilities like JLab and SLAC – there is a substantial overlap with respect to the key experiments they include. In Table 1, we provide a complete listing of the experiments included in each global analysis. Of particular importance for interpreting our results in Sect. 4, the leftmost column of Table 1 designates a process-based category label for each experiment, placing, e.g., the BCDMS \(F^d_2\) inclusive structure function data in Group 4: DIS Deuterium. We will investigate the agreement between these groups of experiments with the help of \(L_2\) sensitivities. In contrast, previous studies [15, 16, 33] employing the same technique focused primarily on the individual experiments.

While the CJ and CT analyses share 10 experiments, Table 1 shows that the CJ fits include additional DIS data at fixed-target energies from SLAC, HERMES, JLab, and NMC. They also include Tevatron measurements of charge asymmetries reconstructed to the level of W bosons and cross sections for photon plus leading jet production. The CT PDF fits more extensively cover collider observables. They include HERA heavy-quark production, an assortment of cross sections and cross section asymmetries in electroweak boson production at the LHC, as well as LHC cross sections for inclusive jet and \(t\bar{t}\) production. The CT fits include CCFR and NuTeV cross sections for both inclusive (Group #8) and semi-inclusive (Group #10, for opposite-sign dimuon production) charge-current DIS on iron. The CT fits, however, implement only the most direct measurements of Tevatron and LHC charge asymmetries in \(W \rightarrow \ell \) lepton decay, presented as a function of the rapidity and transverse momentum of the charged lepton. They do not include the CDF and DØ boson-level charge asymmetries fitted by CJ, which directly probe the large-x PDF ratios, while they also involve additional recursive unfolding of the data that utilizes a PDF-dependent calculation to reconstruct the weak boson’s rapidity.

3.2 Modifications in the fitting methodologies

For the study presented in this article, we modified some default settings of the CJ15 and CT18 fits, fully described respectively in Refs. [9, 11], to place the two fitting frameworks on a common footing and isolate the impact of various assumed treatments of the deuteron structure.

  1. 1.

    We match perturbative orders between the two fits at NLO in \(\alpha _s\). In practice, this means that in the CT fits, performed by default at NNLO, we instead compute the hard cross sections, perturbative PDF evolution, and running of \(\alpha _s\) at \({\mathcal {O}}(\alpha _s)\) accuracy to agree with the default NLO settings used in CJ.

  2. 2.

    We perform supplementary fits by excluding some data sets that appear in one fit only. While both CJ and CT fits include Tevatron lepton charge asymmetry measurements presented as a function of the charged lepton’s rapidity, the CJ fit also includes the fixed-target low \(W^2\) and \(Q^2\) DIS data from SLAC [74] and JLab [73], as well as the CDF [85] and DØ [86] W boson charge asymmetry with reconstructed weak boson kinematics. On the other hand, CT makes use of neutrino-initiated DIS data sets on heavy nuclear targets (both inclusive and semi-inclusive DIS [SIDIS] di-muon production in \(\nu \)-A scattering). In CT, data on heavy-nuclear targets are fitted at the isoscalar level after being corrected in the fit using a phenomenological parametrization of the \(F^A(x,Q^2)/F^d(x,Q^2)\) ratio from Ref. [56]. To isolate the impact of these extra experiments, we performed CJ fits without the W asymmetry and SLAC DIS data sets, and CT fits without the inclusive \(\nu \)-A DIS data.

  3. 3.

    As in the original CJ and CT publications, we estimate the final PDF uncertainties using the Hessian method [1], but in this paper we fix the tolerance to be \(T^2\! =\! 10\) for both global analyses, in between the nominal \(T^2 = 2.71\) in the CJ15 fit and the \(T^2=37\) value (at the 68% probability level) used in the CT18 fits. Furthermore, we do not include the additional “Tier-2” tolerance contribution [11, 105] that is applied in the CT18 fits to prevent the error PDFs from running into strong disagreements with individual experiments, but content ourselves with the “Tier-1” tolerance as defined in [106].

Regarding the lattermost point, in the CJ analysis it is additionally necessary to implement a numerical prescription at the level of individual eigenvector directions of the diagonalized Hessian matrix to guarantee \(\varDelta \chi ^2 = T^2\) to the needed accuracy and to ensure the validity of the analysis methods utilized in this article. These technical details are reviewed in the appendix.

3.3 The \(L_2\) sensitivity technique

In the next two sections, we will investigate the impact of the DIS deuteron data on the PDFs with the help of the “\(L_2\) sensitivity” [16]. The method is easy to use and has already been applied to clarify the sensitivity of the global data sets to the CT18 NNLO PDFs [11], LHC parton luminosities [107] (Sec. II.2), and effective exponents of the high-x PDF falloff [33]. Here we give a quick summary of the \(L_2\) technique and refer the interested reader to Ref. [16] for additional details.

Fig. 5
figure 5

A comparison of the PDF pulls of the deuterium DIS data computed according to the \(L_2\) method at 2 GeV for the CT fixed d.c. (left) and CJ fixed d.c. (right) fits; both cases are for the scenario with fixed deuteron corrections. Here and elsewhere, we place on the vertical axis a self-explanatory label of \(\varDelta \chi ^2\) as we plot the \(L_2\) sensitivity which approximates this quantity

The \(L_2\) sensitivity provides a fast approximation to the information contained in the LM scans typified by Fig. 2. It does so by quantifying the extent to which variations in the fitted PDFs drive the shifts in the log-likelihood \(\chi ^2_E\) for each experiment E. For example, the \(L_2\) plots discussed below supply the change in an experiment’s value of \(\chi ^2\) given a defined increase in the value of a PDF of interest, as a function of x for fixed Q. That being the case, if two experiments are in relative tension with respect to the behavior of a given PDF – say, one favors a larger PDF value in a given x range, the other favors a smaller value – an increase in the PDF would result in a negative change, \(\varDelta \chi ^2_{E_1}\! <\! 0\), in the \(\chi ^2\) for the first experiment, and a positive shift, \(\varDelta \chi ^2_{E_2}\! >\! 0\), for the second experiment. As shall be seen below, these competing pulls are visualized by the \(L_2\) method as opposing deviations from zero in the negative/positive directions. In this fashion, the \(L_2\) sensitivity can encapsulate, on a flavor-by-flavor basis, the patterns of pulls exerted on the PDFs (often called “PDF pulls” below for short) by various experiments or groups of experiments as a function of x and Q. In fact, the method is not limited to the influence of data sets on the PDFs, but can be extended to any observable that can be calculated starting from those.

More formally, the \(L_2\) .sensitivity yields an approximation of the shift \(\varDelta \chi ^2_E\) in the value of the log-likelihood for experiment E caused by an upward 1\(\sigma \) variation of the chosen PDF or PDF-dependent theoretical prediction. We evaluate the \(L_2\) sensitivity in the Hessian approximation as

$$\begin{aligned} S_{f,L2}(E)\,&\equiv \, {\vec {\nabla }} \chi ^2_E \cdot \frac{\vec {\nabla } f}{|\vec {\nabla } f|}\nonumber \\&= \varDelta \chi ^2_E \, \cos {\varphi (f,\chi ^2_E)}\ , \end{aligned}$$
(5)

where \(\cos {\varphi (f,\chi ^2_E)}\) represents the cosine of the correlation angle between a PDF of flavor f (or, indeed, any PDF derived quantity) and the experimental \(\chi ^2_E\), evaluated over the 2N Hessian eigenvector sets. In the Hessian formalism adopted both by the CT and CJ frameworks to estimate PDF uncertainties, this correlation cosine can be computed as indicated in Ref. [15]:

$$\begin{aligned} \cos {\varphi (f,\chi ^2_E)} =\frac{1}{4\varDelta f \varDelta \chi ^2_E}\sum _{j=1}^{N}(f_{j}^{+}-f_{j}^{-})([\chi ^2_E]_{j}^{+}-[\chi ^2_E]_{j}^{-}), \end{aligned}$$
(6)

where

$$\begin{aligned} \varDelta f=\left| {\vec {\nabla }}f\right| =\frac{1}{2}\sqrt{\sum _{j=1}^{N}\left( f_{j}^{+}-f_{j}^{-}\right) ^{2}}\ , \end{aligned}$$
(7)

and “±” denote the PDF variations along the positive and negative direction of the j-th Hessian eigenvector.

When \(L_2\) sensitivities are summed over all experiments, the resulting sum should be close to zero by construction, assuming deviations from a symmetric Gaussian probability distribution are negligible (see the appendix for further discussion). As an example, Fig. 5 shows the combined \(L_2\) sensitivities, \(S_{f,L2}(E)\), of the experiments in the DIS deuteron group (#4 in Table 1) to each parton flavor separately, calculated according to Eq. (5) for the CT and CJ fixed d.c. fits, where the deuteron corrections were fixed to the central value determined in the CJ15 analysis. These figures can be interpreted as the statistical pulls at fixed \(Q=2\) GeV from this group of experiments on each PDF flavor, f(xQ), at the x values specified on the horizontal axis. One can observe quite large deviations from zero, with \(S_{f,L2}\) values nearly reaching \(\pm 10\) units in some regions of x. This non-negligible pull by the deuteron DIS experiments is ultimately offset by contributions from other groups of experiments to obtain a zero result (within about one unit of \(\chi ^2\)) when summing over all of these. It is therefore interesting to investigate which experimental groups pull significantly against the DIS deuteron data sets, as we do below in Sects. 4.5 and 4.6. Here and elsewhere, we compute \(L_2\) sensitivities at a default scale of \(Q=2\) GeV, as this is close to the initial scale, 1.3 GeV, for DGLAP evolution, as well as to the Q values accessed in DIS experiments and typical scales used to present predictions for PDFs in nonperturbative QCD models and lattice QCD [16]. We typically do not observe pronounced differences in the \(L_2\) sensitivity plots for distinct scales, \(Q_{1,2}\), provided \(|Q_2 - Q_1| \lesssim {\mathcal {O}}(100\,\text {GeV})\). A set of companion plots generated at a higher scale, \(Q=100\) GeV, may be viewed at Ref. [108].

Figure 5 contains a substantial amount of information. For example, looking at the left panel for CT fixed d.c., the negative \(S_{f,L2}\) for the d quark at \(x \approx 0.25\) indicates that the deuteron DIS data prefer a higher d quark at \(x\approx 0.25\) than the nominal d-quark PDF in the full fit. Similarly, the positive \(S_{f,L2}\) for the u quark at the same x indicates a preference for a relatively lower u-quark PDF. In totality, the deuteron DIS data prefer a higher d/u at \(x=0.1\!-\!0.4\) than that obtained in the full CT fit. (This preference for an enhanced d/u in this x interval is further confirmed in Fig. 9 of Sect. 4.5.)

From the right panel of Fig. 5, we also read that the deuteron DIS data in the CJ fixed d.c. fit prefer an enhanced d/u over a slightly higher interval \(x=0.3\!-\!0.7\). Regarding other flavors and x ranges, in the left panel of CT fixed d.c. we observe a significant preference of the deuteron DIS group for lower u- and d-quark PDFs at \(x=0.01\!-\!0.1\), in the region relevant for LHC W-boson production. One also notices a preference for a larger gluon PDF in the interval, \(x=0.02\!-\!0.1\), relevant for Higgs-boson production at the LHC, and, at slightly lower x, for a larger perturbative charm-quark PDF, which is radiatively generated from the gluon.

Finally, we remark that the Hessian sensitivity is most effective in identifying the top 5-10 experiments or groups of experiments sensitive to variations in the chosen PDF, as has been verified by comparing the rankings obtained from Hessian sensitivities and LM scans [15, 16]. However, especially for subleading experiments, detailed rankings depend on the chosen definition of the sensitivity indicator and deviations from the simple linear approximation that we assumed when using symmetric finite derivatives in \(S_{f,L2}(E)\) as in Eqs. (6) and (7).

Table 2 For each fit, we report the total \(\chi ^2\) per point, \(\chi ^2/N_{ pt }\), as well as its breakdown according to the experiment categories listed in Table 1. The data sets removed in the CJ no-W_slac fit are singled out and emboldened in their respective categories. The \(\nu \)-A inclusive data excluded in the CT no nu-A fit are also emboldened. We note that the \(\chi ^2/N_{ pt }\) values for CJ free d.c. are the same as those reported above for CJ fixed d.c. given that the central fit of CJ fixed d.c. corresponds to the best fit obtained when freely varying all parameters in CJ free d.c.

4 Comparison of deuteron data impact in the CJ15 and CT18 fits

In accordance with the preceding discussion in Sects. 2 and 3, our analysis will be based on a series of fits named according to the following convention:

  1. 1.

    Fits without deuteron corrections: CT no d.c., CJ no d.c.;

  2. 2.

    Fits with the fixed CJ15 correction: CT fixed d.c., CJ fixed d.c.;

  3. 3.

    A CJ fit in which the off-shellness correction is freely varied: CJ free d.c.;

  4. 4.

    Fits with the fixed CJ15 deuteron correction and variations in the fitted data sets: CT no nu-A (removing inclusive \(\nu \)-A DIS experiments from CT), and CJ no-W_slac (removing the CDF [131] and DØ [132] W boson asymmetry data and SLAC DIS [proton and deuteron] sets from CJ).

In each enumerated case, the CT and CJ fits are methodologically comparable to each other. The differences between the fits in the first two categories will highlight the nontrivial effects of deuteron corrections. We consider several variations of the fixed d.c. fits. Firstly, the comparison of the CJ sensitivity plots in categories 2 and 3 demonstrates that freeing the d.c. parameters in the CJ fit tangibly improves the agreement among all categories of experiments. Secondly, we try to make the CT and CJ fits comparable not only methodologically but also (partially) with respect to data selections. We do this in category 4 by removing the indicated data sets from each fit.

4.1 Overall agreement of theory and data

We first review the overall quality of these fits by referring to Table 2, which lists values for the total \(\chi ^2\) per point (\(\chi ^2/N_{ pt }\)) according to the experimental groups listed in Table 1. The \(\chi ^2\) values for SLAC DIS, Tevatron W-boson asymmetry, and \(\nu \)-A DIS data sets are shown in Table 2 in separate lines.

The \(\chi ^2\) values in Table 2 indicate that the deuteron data agree globally with the published fits of both groups. Deuteron corrections are essential to the CJ analysis, which includes deuteron DIS data at the largest values of x from SLAC as well as the very sensitive DØ data on the reconstructed W-charge asymmetry [9]. Even if the total \(\chi ^2/N_{ pt }\) may seem only marginally improved, the highlighted data sets require deuteron corrections to reconcile the SLAC DIS deuteron data with the deuteron-independent DØ W-asymmetry data.

In CT, the inclusion of deuteron corrections also improves the description of the DIS deuteron data (especially the BCDMS \(F^d_2\) measurements), producing a 14-unit reduction in \(\chi ^2\) for the \(N_{ pt }\!=\!373\) points fitted in Group #4, with an additional 6-unit reduction once the inclusive \(\nu \)-A data are removed. Mainly due to the absence of the SLAC DIS data in CT, this is a smaller relative reduction than that observed for CJ in the left columns of Table 2.

It is also interesting to note far more substantial shifts in \(\chi ^2\) within Table 2 among the other CT experimental groups: the introduction of the fixed deuteron correction in CT improves the \(\chi ^2\) of the DY data (#7) by more than 100 units, while at the same time, increasing the \(\chi ^2\) of the LHC weak-boson production (#6) by an opposing \(\varDelta \chi ^2\) change of 80 units. The \(\chi ^2\) for group 5, “WZ Tevatron”, also increases by 16 units. The inclusive \(\nu \)-A DIS data set (#8) is fitted well globally, with \(\chi ^2/N_{pt}=0.80\ (0.83)\) in the CT no d.c. (CT fixed d.c.) fit. In sum, the fixed deuteron correction improves the CT total \(\chi ^2\) by 18 units for 3670 data points. Since the \(\nu \)-A DIS data are well-described, removing them from the CT fit actually increases the total \(\chi ^2/N_{pt}\), but also seems to release some tension with the WZ LHC data, whose \(\chi ^2\) value decreases by 32 units. While these look like modest changes, overall, we will see next that they do influence some PDF flavors and the local compatibility among select groups of experiments.

It is instructive to evaluate the shifts in \(\chi ^2\) discussed above in the context of corresponding variations associated with NNLO effects. For this purpose, we compare in Table 3 values of \(\chi ^2\) obtained with CT for the default NLO fits explored in this paper, CT no d.c. and CT fixed d.c., against two NNLO fits. These latter fits are the NNLO counterpart of CT no d.c., which we call no d.c. NNLO, and an alternative NNLO fit adopting the modified DIS scale choice of the CT18X NNLO fit discussed in Ref. [11], which we denote no d.c. NNLO-X. These additional fits allow us to quantify the impact on \(\chi ^2\) of the NNLO correction itself, as well as the effect of perturbative scale variations at NNLO, respectively. In addition, in the trailing columns of Table 3 we also show \(\chi ^2\) differences of each of these fits with respect to our base CT no d.c. fit at NLO. For example,

$$\begin{aligned} \delta ^{\mathrm {NNLO}} \equiv \chi ^2({\mathrm {no\,d.c.\,NNLO}}) - \chi ^2({\mathrm {no\,d.c.\,NLO}}). \end{aligned}$$
(8)

These comparisons illustrate that inclusion of the fixed deuteron correction at NLO may impact the theoretical description of select experimental groups at a level comparable to NNLO corrections and scale variations. This is especially evident for the DIS deuteron data, for which the 14-unit reduction in \(\chi ^2\) discussed above for CT fixed d.c. can be compared to small 5 and 4-unit increases with no d.c. NNLO and no d.c. NNLO-X, respectively. The above-mentioned \(\chi ^2\) shift for the Drell–Yan data with CT fixed d.c. similarly surpasses the corresponding shifts obtained with the NNLO fits. In contrast, the \(t\bar{t}\) data, for instance, are largely insensitive to the deuteron correction at NLO, but are significantly better-fitted with NNLO corrections. Ultimately, as future generations of QCD fits pursue higher accuracy at NNLO and beyond, it may therefore also be appropriate to consider deuteron corrections.

Table 3 Complementary to Table 2, we compare the values of \(\chi ^2\) obtained by CT in the no d.c. fit at NLO as well as at NNLO with both default CT18 settings and those of the alternative, CT18X global fit (first three columns, respectively) with CT fixed d.c. at NLO. The corresponding shifts away from the \(\chi ^2\) obtained under CT no d.c. NLO for each experimental group, \(\delta ^{\mathrm {NNLO}}\), \(\delta ^{\mathrm {NNLO-X}}\), and \(\delta ^{\mathrm {fixed\, d.c.}}\), are given in the lattermost columns
Fig. 6
figure 6

Upper row: The PDF ratios d/u and their asymmetric error bands for \(T^2=10\) at scale \(Q=2\) GeV within the CT (left) and CJ (right) fitting frameworks. We normalize all d/u error bands to the ratio from the central no d.c. fit (without any assumed deuteron correction). The left panel shows the CT no d.c., CT fixed d.c., and CT no nu-A error bands. The right panel shows the analogous CJ no d.c., CJ fixed d.c., CJ no-W_slac, and the CJ free d.c. fits. The abscissas are scaled to highlight the impact of the deuteron corrections at large x, where the impact is most pronounced, as well as the modest enhancement in the d/u ratio for \(x \lesssim 0.01\) in CT at left. Lower row: now showing the absolute d/u ratios on a linear x-axis scale to highlight the behavior at high x

Fig. 7
figure 7

The valence d-quark PDFs and their uncertainties, normalized to the central values obtained in the no d.c. fits. The fits and conventions are the same as for Fig. 6

Fig. 8
figure 8

Same as Fig. 7, but now for the gluon PDF

4.2 Impact of deuteron corrections on the d/u PDF ratio

As discussed in Sect. 1.1, experimental information involving deuterium targets, especially measurements of the DIS structure function of the deuteron (Group #4, “DIS Deuterium,” in Table 1), has been pivotal for separating the d-quark content of the proton from other parton flavors. The leading impact of deuteron corrections thus primarily influences the extracted d-PDF at high x, where the deuteron most prominently differs from a superposition of a free proton and a free neutron. In contrast, the effect of the deuteron corrections on the u-type PDFs is comparatively mild, as these are most directly constrained by measurements of the proton’s structure function (Group #3, “DIS Proton”).

To gauge the leading impact of deuteron corrections, in this subsection we now examine the x dependence of the d/u ratio within the CT/CJ frameworks, before proceeding in Sect. 4.3 to an examination of the indirect effects on the lower-x \(d_{{ val }}\) PDF relevant for \(\sin ^2 \theta _W\) and on the gluon PDF in Sect. 4.4 (the PDF pulls will be considered in Sects. 4.5 and 4.6).

Figure 6 illustrates the impact of the \(F^d_2\) corrections, as introduced in Sect. 2.1, on d/u. The upper row shows the d/u ratio and its error band obtained in the discussed series of fits, normalized to the central value obtained in the fits with no deuteron corrections. The lower row shows the unnormalized d/u ratios themselves, using a linear horizontal scale to better visualize the \(x>0.1\) interval, and in particular the \(x \rightarrow 1\) behavior of this quantity. In both rows, the left and right panels give results for CT and CJ fits, respectively. We see that the deuteron corrections have a qualitatively similar impact on d/u in both CT and CJ, especially at \(x\! > rsim \! 0.1\), with evidence of a mild, few-percent enhancement of the fitted d/u ratio over the no d.c. fits for \(0.1\! \lesssim x\! \lesssim 0.5\) once the fixed deuteron correction is included. This enhancement turns into a suppression at still higher \(x\! > rsim \! 0.5\), beyond which d/u is strongly affected by the 2-body nucleon-nucleon corrections included in the \(F^d_2\) calculation. For CJ, this suppression is larger than in the CT case, but compatible with the latter within the respective uncertainties of each fit.

The qualitative x dependence of the deuteron-corrected fit of d/u in the top rows of Fig. 6 closely follows the \(F^N_2/F^d_2\) ratio plotted in Fig. 3, in which \(F^N_2\) and \(F^d_2\) represent the deuteron structure function computed using the isoscalar and full nuclear predictions, respectively. Indeed, in the no d.c. fits, \(F^d_2\) is effectively fitted as an isoscalar target, but in the CT fixed d.c., for example, the \(F^d_2\) data for the physical deuteron are corrected to \(F^N_2\), which leads to a relative suppression of the fitted d/u PDF ratio for \(x\! > rsim \! 0.5\).

In the CJ and CT fixed d.c. fits (red curves), the d.c. parameters are held constant at their values obtained from the central CJ15 fit. If, on the other hand, the d.c. parameters in Eq. (3) are actively fitted with the PDF shape parameters, we obtain the same central PDFs but a narrower uncertainty band on d/u, as shown by the CJ free d.c. error band in the right panels of Fig. 6. This reduction in the PDF uncertainty of d/u, which at first sight is paradoxical because we have increased the number of fit parameters, is actually a consequence of the correlation between the treatment of \(F^d_2\) structure function data and extracted d-PDF. More specifically, when the nucleon off-shellness parameters are freed, the variations in the d-quark parameters that were necessary to encompass the \(F^d_2\) data in the CJ fixed d.c. fit are partly absorbed by the parameters of Eq. (3). In other words, releasing the off-shell parameters reduces tensions in parameter space, and ultimately also diminishes the overall experiment-by-experiment \(\chi ^2_E\) variations in the PDF analysis, as we shall note again below in Sects. 4.5 and 4.6. We have verified that, over the same x range, the relative uncertainty in the determination of other flavors, such as, e.g., the gluon, does correspondingly increase. This is an instance of the fact that, typically, the constraining power of a fit can be enhanced by a greater number of free parameters only in a limited sector of parameter space.

The different choices of data sets in the CT and CJ fits also affect the fitted d/u PDF ratio. For example, unlike CJ, the CT fit includes DIS data from inclusive \(\nu \)-A collisions, multiplied by a phenomenological parametrization of the heavy-nuclear structure function relative to the physical deuteron. The green bands in Fig. 6 (left) are for the CT no nu-A variant of the CT fit. The removal of these data augments the shifts in the CT d/u ratio induced by including the fixed deuteron correction, which now is comparable to the CJ result. The reason for this may simply lie in the lack of further deuteron-to-isoscalar proton plus neutron corrections, or may also be related to possible discrepancies between neutral- and charge-current DIS data or interactions [52, 109, 110].

Conversely, the CT fits do not include the low-W SLAC data and the reconstructed W-boson asymmetry from the DØ experiment that are influential upon the large-x d-quark fit in CJ [9]. When these are removed from the CJ fit as well – see the green CJ no-W_slac bands in the right column of Fig. 6 – we obtain an enlarged uncertainty that includes both the deuteron corrected and uncorrected bands. The uncertainty on d/u at \(x\rightarrow 1\) in the CJ no-W_slac fit is wider than in the CT no nu-A fit in part in reflection of different parametrization forms and selection of experiments between the CJ and CT analyses.

4.3 Impact on the valence PDFs in the LHC EW precision region

The effects of including deuteron structure corrections to \(F^d_2\), while most pronounced at \(x>0.1\), have some consequences in the low-x region as well. In the CT result shown in Fig. 6 (left), modest enhancements in d/u at about \(x\!\sim \!10^{-3}\) can be seen for the CT fixed d.c. fit. While the sea-quark PDFs are relatively unaffected in this kinematic region, the valence component of the d-quark and, to a lesser extent, the u-quark PDFs in this small-x region are sensitive to the inclusion and theoretical treatment of both neutrino-nucleus and deuterium data at large x. This sensitivity is mainly a consequence of corresponding valence sum rules.

Figure 7 illustrates this feature. In the left panel, the fixed d.c. CT best-fit valence PDF (red dotted line), shown in this case at \(Q=81\) GeV \(\approx M_W\), prefers a slightly lower \(d_{val}\) at \(x\approx 0.008-0.13\) than in the no d.c. fit, and a slightly higher one at \(x\approx 0.13-0.45\). This deviation becomes still more pronounced if the neutrino-nucleus DIS data are removed (leading to the green dot-dashed line). In the right panel, we observe for CJ a qualitatively similar trend and associated x dependence over slightly shifted x regions of \(0.01-0.1\) and \(0.1-0.53\).

In Sect. 1.1, we illustrated in Figs. 1 and 2 that the weak-mixing angle measurements at the LHC using the forward-backward asymmetry, \(A_{{\mathrm {FB}}}\), are sensitive to the valence d- and u-PDFs in an x-region about \(\sim \!0.03\), i.e., where Fig. 7 indicates a dependence of these PDFs upon the treatment of the deuteron/heavy-nucleus data at higher-x values.

4.4 Impact of deuteron corrections on the gluon PDF

The deuteron data sets can also impact the gluon density through \(Q^2\)-scaling violations; this is particularly true when these measurements cover a large range of the four-momentum squared, \(Q^2\), of the exchanged boson. Similarly to the case for \(d_{{ val }}\) in Sect. 1.1, the Lagrange Multiplier scans [11] and PDF sensitivity techniques [15, 16] in the CT18 analysis collectively demonstrate that the gluon at \(x>0.1\) receives significant constraints from the DIS deuterium data. Constraints from the extensive fixed-target DIS data are in fact competitive with the HERA DIS data, which probe the lower-x region, and also with LHC and Tevatron inclusive jet production, which cover a wide x range but involve complex arrays of systematic effects which remain under active study.

To address this point, in Fig. 8 we plot the error bands for the gluon PDF at \(Q\!=\!2\) GeV as determined in the series of fits discussed at the beginning of this section, again normalized to the central value obtained in the no d.c. fits. As seen in the left panel of Fig. 8, the gluon PDF in the CT fits exhibits a modest sensitivity to the chosen deuteron correction treatment, with a dependence that is somewhat moderated by the adopted \(W^2 > 12.25\) GeV\(^2\) cut. Still, as with d/u, there is a qualitative tendency for the fixed deuteron correction to reduce the high-x gluon PDF, with this modification being enhanced by the exclusion of the inclusive \(\nu \)-A DIS data. Like CT, the CJ fits seen in Fig. 8 (right), which include the SLAC DIS data, similarly display a relative suppression of the gluon for \(x\! > rsim \! 0.3\) once deuteron corrections are taken into account. While this effect is of moderate size, it is nonetheless statistically significant in the context of the \(T^2=10\) tolerance used to determine the uncertainty bands. For CJ, it will be interesting to confirm this effect by fitting the full JLab 6 GeV inclusive data set [111], and, even more so, the JLab 12 GeV data which will augment the precision of the available DIS measurements over a wide \(Q^2\) range at large x, once available.

4.5 Valence-sector PDF pulls: the d/u ratio

In Fig. 9, we plot the \(L_2\) sensitivities of the groups of experiments to the d/u PDF ratio for varied implementations of the deuteron corrections. \(L_2\) sensitivities for fits exploring data-set variations are shown in Fig. 10, which we also discuss below.

Fig. 9
figure 9

The \(L_2\) sensitivities computed according to Eq. (5) for \(Q=2\) GeV, giving the pulls on the d/u PDF ratio of the process-dependent data sets fitted by CT (left) and CJ (right). Upper, middle, lower rows: results for the no d.c., fixed d.c., and free d.c. fits discussed at the beginning of Sect. 4

Starting with the no d.c. fits that either do not include (CT) or remove (CJ) deuteron corrections, we notice that, in both cases, the landscape of PDF pulls tends to be dominated by a few competing groups of experiments, which differ between the two fitting frameworks. For CT in the left panel, these are the LHC W/Z production (Group #6) and inclusive nuclear DIS (Group #8), which possess the sharpest opposing pulls at \(x\!\sim \! 0.2\), for example, in the direction of either favoring or disfavoring a larger d/u ratio, respectively. At slightly higher \(x\! >\! 0.3\) values, which are of particular interest from the perspective of QCD-informed models of the d/u behavior as \(x \rightarrow 1\), these are joined by the DIS-deuterium (#4) and Drell–Yan (#7) groups of experiments. Turning to the CJ case, displayed in the top-right panel, it is the DIS deuterium (#4) and gamma-jet (#1) groups that dominate the landscape of PDF pulls – and quite strikingly at large x – with lesser but also significant pulls from the Tevatron W/Z production data (#5). The large-x pulls are expected, since the SLAC data are quite sensitive to nuclear dynamics in the deuteron target, as already noted.

Once fixed deuteron-structure corrections are introduced into the respective fixed d.c. fits, the relative patterns of PDF pulls experience an intriguing series of shifts, which we display in the middle row of Fig. 9. For CJ, the deuteron corrections largely resolve the huge tensions between the photon+jet and DIS deuteron data, because dynamical nuclear effects in the latter are now included in the theoretical calculation of the deuteron DIS cross section without forcing the d-quark to deform to compensate for the missing nuclear effect. A residual tension between the DIS deuteron (#4) and W/Z Tevatron data (#5) data is still visible at \(x\!\approx \! 0.5\). While the large-x tensions are reduced, the small-x pulls visibly change in shape for several flavors.

For CT, the introduction of the fixed deuteron correction detailed in Sect. 2.1 instead preserves the qualitative x dependence of the \(L_2\) pulls (i.e., the shapes) but softens their magnitude in a few cases, for example, for the Drell–Yan (#7) and the small-x LHC W/Z production (#6) data. Notably, the DIS-deuterium sensitivity shifts to closely resemble that of the LHC W/Z data (#6) in favoring a larger d/u for \(x \sim 0.2\). At the same time, the size of the competing pulls at high \(x\! > rsim \! 0.3\) between the DIS-deuterium and inclusive \(\nu \)-A data (#8) is enhanced, while the opposing pulls of other experiments are modestly reduced over the same range in x. This is especially clear for the CT Drell–Yan data (#7), which in the CT no d.c. fit had preferred a softer d/u ratio at low \(x\! <\! 0.01\) and an enlarged value of d/u over \(0.01\! \lesssim \! x\! \lesssim \! 0.2\). In CT fixed d.c., these preferences mostly vanish. At the same time, outside this interval of very high x, the opposing pulls of the inclusive \(\nu \)-A data (Group #8) and both the DIS deuterium (#4) and LHC W/Z (#6) experiments increase and sharpen for \(x > rsim 0.01\). In fact, this is the same collection of experiments for which in Table 2 we observed increases (in the case of Groups #6 and #8) in their respective values of \(\chi ^2/N_{ pt }\) upon introducing deuteron corrections for \(F^d_2\). Both the \(\chi ^2\) values in Table 2 and the \(L_2\) analysis for CT therefore indicate a noticeable rearrangement of the pulls of the inclusive neutrino-nucleus DIS data and select other experiments introduced by the correction to the deuteron DIS data. This rearrangement, being presently of similar order with respect to other contributing effects, will require attention in the future.

Fig. 10
figure 10

As in Fig. 9, we plot the PDF pulls on d/u at 2 GeV with fixed deuteron corrections present, but, in this case, removing select experiments which have shown significant competing pulls with respect to the DIS deuterium sets. For CT (left panel), we remove the inclusive \(\nu \)-A data (Group #8), while for CJ, we remove the SLAC DIS experiments (part of Group #4) and W-asymmetry information from the Tevatron (part of Group #5)

In the last row of Fig. 9 we present the \(L_2\) pulls of the CJ free d.c. fit in which the deuteron off-shellness degrees-of-freedom in Eq. (3) are released. Comparing the vertical extents of the peaks with those of the CJ fixed d.c. fit in the middle row, we see that freeing the offshell parameters moderates the PDF pulls over the whole x range, especially those at \(x\! >\! 0.3\) between the DIS deuteron (#4) and the WZ Tevatron CJ Drell–Yan (#5) data. This behavior can be generically understood as a consequence of increasing the number of free parameters, but is not guaranteed, for example, in the presence of incompatible data sets. The CJ free d.c. plot is thus an indication of a good level of consistency between the considered data sets, when the PDFs and deuteron corrections are fitted together.

Fig. 11
figure 11

Analogous to Fig. 9, for the PDF pulls on g(xQ) at \(Q=2\) GeV

The results discussed so far suggest a nontrivial relationship between the treatment of the DIS deuterium data and the description of other data sets in each fitting framework: the impact of deuteron-structure corrections in a global fit like CT and CJ cannot generally be expected to apply to deuterium data alone, but have secondary effects on the patterns of pulls of other data sets.

It is therefore interesting to study variations in the choices of experimental data sets in both fits, in particular, removing from each analysis those data sets that showed especially strong sensitivity to deuteron corrections or otherwise played a major role in the foregoing discussion.

For CT, we remove the entire collection of inclusive \(\nu \)-A measurements (Group #8), and refit with the fixed deuteron corrections of Sect. 2.1 in place; the resulting \(L_2\) sensitivity plot is displayed in the left panel of Fig. 10. Overall, the magnitude of the PDF pulls is slightly reduced, with the biggest change occurring for the DIS Deuterium Group (#4), which is now more closely aligned with the pulls exerted by the DIS proton data (#3) throughout the plotted domain in x. When considered in parallel with Fig. 9, and in the light of the previous discussion of Fig. 9, the left panel of Fig. 10 suggests a connection between the pulls of the DIS deuteron and \(\nu \)-A data in fits with and without deuteron corrections. For both groups of experiments, the interplay between the theoretical description of deuterium and heavy-nuclear data is relevant. To that end, investigating the systematic treatment of nuclear effects for light and heavy nuclei is a critical subject for future global analyses that aim to use such data for constraining the nucleon PDFs to higher accuracy.

A similar consideration arises for CJ. As we have discussed, the combination of W-boson charge asymmetries and SLAC DIS data is strongly constraining on the d/u ratio at large x, and the d.c. treatment influences also the PDF pulls at smaller x values (as seen in the right panels of Fig. 9). We therefore remove these data sets from the fit to obtain the CJ no-W_slac fit shown in the right panel of Fig. 10. In this instance, the removal of the combined W and SLAC DIS data relieves tensions seen in CJ fixed d.c., for \(x\! <\! 0.1\). However, the large-x tension between DIS deuteron and WZ Tevatron data (that now only include the \(W \rightarrow \ell \) decay lepton asymmetries) remains largely intact and can in fact also be seen in the CT panel on the left of Fig. 10. It remains to be seen whether this is of experimental origin, or due to an as yet incomplete treatment of nuclear corrections in the deuteron target.

Fig. 12
figure 12

PDF pulls on g at 2 GeV after removing the inclusive \(\nu \)-A experiments for CT (left) and W charge asymmetry and SLAC DIS data from CJ (right), with deuteron corrections fixed

4.6 PDF pulls in the gluon and light-quark sea sectors

At first sight, it might seem reasonable to suppose that deuteron-structure corrections, being most sizable at high x and more immediately connected to extractions of the d-quark, would be relatively inconsequential for determinations of the gluon PDFs. In actuality, constraining the gluon PDF through DIS data requires an adequate prediction of the scale dependence of both proton and deuteron DIS data sets, with the latter simultaneously sensitive to the (\(Q^2\) dependent) deuteron corrections discussed in Sect. 2.1. Moreover, the momentum sum rule requires that the changes in the total momentum fraction from the large-x and small-x quark and gluon PDFs compensate one another. The practical implementation of the deuteron correction can therefore impact g(xQ) over a still broader range beyond high x.

We therefore turn to an examination of the pulls on the gluon PDF in fits with and without deuteron corrections, presented in Fig. 11, before examining CT and CJ fits with the modified data sets in Fig. 12. Comparing the CJ no d.c. fit in the upper-right panel of Fig. 11 to the CJ fixed d.c. fit in the middle-right panel, one sees that adding a fixed deuteron correction clearly aligns the pulls of the DIS proton group (#3) and DIS deuteron group (#4) on the gluon. The x dependence of these pulls was effectively uncorrelated without the deuteron correction. In the presence of the fixed correction, however, they are aligned and pronounced over the whole x range and are opposed mostly by the strong pull of the WZ Tevatron data (Group #5). Furthermore, after the off-shell parameters in the CJ deuteron correction are freed (lowermost panel), the tension between Groups #3, #4, and #5 is relieved, resulting in a more consistent data set, with weak pulls (\(\lesssim 3\) units) everywhere. It is also interesting to note that a similar effect arose in the d/u sector discussed in the previous subsection, and therefore seems to be a robust feature of fitting the deuteron corrections simultaneously with the PDF parameters.

In the two CT fits in the upper left and middle left panels of Fig. 11, a somewhat different pattern emerges. Inclusion of the deuteron correction in CT fixed d.c. does have the effect of partially aligning the pulls of the DIS Proton (Group #3) and DIS deuteron (Group #4) on the gluon, but this effect is restricted to a narrower interval, \(x\in [0.2, 0.5]\). The pulls from the other groups are relatively unaffected by the deuteron correction, although we see some realignment of the pulls from LHC WZ and fixed-target Drell–Yan production experiments (Groups #6 and #7). The weaker dependence of the gluon pulls in the CT fit on the deuteron correction compared to the CJ case is most likely due to the absence of the SLAC DIS data from the former fit. One can therefore investigate the effect of the removal of the SLAC data on the CJ fit. Intriguingly, as shown in the right panel of Fig. 12, simultaneously removing the SLAC DIS data and the Tevatron W-boson asymmetry data largely alleviates the competing gluon pulls, which are now smaller than those observed in the CT fit, especially from the LHC sets not included in CJ.

Clearly, the gluon pulls in the CT fit are due to the data other than the large-x SLAC and W production data. In particular, in the absence of the large-x SLAC and W production data in the CT fit, we notice a strong preference for a harder gluon at \(x\approx 0.3\) from the \(\nu \)-A DIS experiments (Group #8) both with and without the nuclear correction. In fact, the preference of the CDHSW \(\nu \)-A DIS deuteron data for a harder gluon at large x had already been identified in the CT18 analysis [11], although the net effect of including this data set in the CT18 fit does not exceed the PDF uncertainty. However, as the left panel of Fig. 12 shows, removing these data from the fit does not substantially alter the pulls of the remaining experiments shown in the CT plots of Fig. 11, which are led by the jet and W/Z measurements from the LHC.

5 Conclusion

In this analysis, we have for the first time undertaken a comparative analysis of two global fitting frameworks, CTEQ-JLab (CJ) and CTEQ-TEA (CT), using the \(L_2\) sensitivity statistical metric developed in Refs. [15, 16]. This metric allowed our study to take advantage of the complementary strengths of the two frameworks: the extended experimental coverage and various theoretical developments implemented in the two approaches, as well as the flexible PDF parametrizations available in CT and the unique capabilities of CJ in describing low-energy and nuclear dynamics. In doing so, we made a number of technical adjustments to each framework (discussed in detail in the appendix) in order to reconcile the CT and CJ treatment of PDF uncertainties and thereby render them sufficiently similar to be meaningfully juxtaposed.

We have, in particular, concentrated on evaluating the impact on PDF determinations of nuclear corrections which take into account the two-baryon structure of the deuteron. In fact, as discussed in Sect. 1.1, DIS and Drell–Yan measurements on deuterium are very informative in providing flavor separation of d-type quarks from other parton flavors (under an assumption of nucleon charge symmetry). At the same time, the introduction of deuterium data into proton PDF fits brings along its own uncertainties associated with nuclear and power-suppressed effects. Global analyses take diverse approaches in handling the deuteron and heavy-nuclear effects, from selection of the least affected experimental data [11,12,13], to including some fixed [11] or free [9, 13] nuclear corrections and performing Bayesian marginalization [37, 57] with respect to the nuclear parameters. It is therefore important to understand the role of the deuteron corrections in a controlled setting, by isolating them from other factors that affect the existing PDF ensembles at comparable levels.Footnote 1

By examining the fitted PDFs and resulting PDF pulls of experimental data under several theoretical scenarios for the treatment of deuteron corrections, in Sect. 4 we have gathered a substantial number of results that clarify these questions. We reiterate here our overriding conclusions based on this investigation:

  • While the compilation of \(\chi ^2\) values in Sect. 4.1 indicates good global agreement of CJ and CT NLO theoretical predictions with deuteron data sets, the \(L_2\) sensitivity additionally provides insights about local compatibility of fitted experiments in an x-dependent fashion. In the case of CJ, the model of deuteron dynamic effects is crucial for the description of the informative low-Q DIS data from SLAC. The dependence on the deuteron correction is reduced in the CT analysis with its more conservative cut on \(W^2\). Still, including the CJ deuteron correction reduces \(\chi ^2\) for the aggregated DIS-deuteron experiments by about \(\sim \) 14 units and also reduces the cumulative \(\chi ^2\) for vector-boson production data sets by several tens of units, with the modifications potentially comparable to the NNLO scale dependence in an analysis like CT. Another effect of the deuteron correction is to alleviate the competing pulls of deuteron and some other experiments in the large-x region.

  • The impact of a fixed deuteron correction is particularly evident in the high-x distribution for the d-quark, or the associated d/u PDF ratio. A number of commonalities exist between the CT and CJ analyses in the qualitative effect of this correction on the extracted high-x PDFs. The fixed deuteron correction of Sect. 2.1 generally leads to the suppression of the d/u ratio at \(x\! >\! 0.5\) relative to the scenario without the deuteron correction.

  • Due to the influence of sum rules and nontrivial correlations among fitted PDFs of different parton flavors, deuteron corrections to DIS structure functions at large x can have important secondary effects on, e.g., the gluon or sea-quark PDFs over a range of x, as well as the \(d_{{ val }}\) distribution at lower \(x\! \sim \! 0.03\) of relevance to precision studies in the electroweak sector.

  • In both fitting frameworks, the modifications caused by the deuteron-structure corrections are moderated by the inclusion of some non-deuteron data sets; for CT, these are inclusive neutrino DIS data on heavy nuclear targets, while for CJ, a combination of high-x SLAC DIS data and reconstructed boson-level Tevatron charge asymmetry requires special attention. Disentangling the interplay among these fitted experiments will require a further study at NNLO accuracy, including additional investigation of the implementation of theoretical corrections for nuclear data sets (including both deuteron and heavier targets) and the treatment of W/Z data.

As the drive to realize next-generation accuracy in PDF analyses gains speed with preparations for the High-Luminosity LHC [112], Electron-Ion Collider [113], and Long-Baseline Neutrino Facility [114], we recommend consideration of deuteron corrections and broader nuclear effects in PDF analyses, as well as continued phenomenological and model-based studies [59, 115,116,117] of deuteron structure in parallel. Deuteron effects will become particularly unavoidable with increasing PDF precision and in PDF-benchmarking studies, most obviously for the d-PDF at \(x\! > rsim \! 0.2\) and beyond, but, ultimately, for consistency of the extracted gluon density and over a widening range of x. Consideration of the parton-level violation of the charge symmetry in the deuteron [118] may become relevant as precision goals advance still further. As emphasized in Sect. 1.1 and 4.3, the achievement of ultimate precision in tests of the SM in the electroweak sector will partly depend upon the successful treatment of the issues described in this work.