Impact of LHCb 13 TeV $W$ and $Z$ pseudo-data on the Parton Distribution Functions

We studied the potential of the LHCb 13 TeV single $W^{\pm}$ and $Z$ boson pseudo-data on constraining the Parton Distribution Functions (PDFs) of the proton. As an example, we demonstrated the sensitivity of the LHCb 13 TeV data, collected with an integrated luminosity of 5 fb$^{-1}$ and 300 fb$^{-1}$, respectively, to reducing the PDF uncertainty bands of the CT14HERA2 PDFs, using the error PDF updating package ePump. For that, the sensitivities of various experimental observables have been compared. Generally, sizable reductions in PDF uncertainties can be observed in the 300 fb$^{-1}$ data sample, particularly in the small $x$ region. The double differential cross section measurement on $Z$ boson $p_T$ and rapidity can greatly reduce the uncertainty bands of $u$ and $d$ quarks in almost all $x$ range, as compared to various single observable measurements.

In the hadron colliders, most of physics analyses are highly rely on the understanding of the parton picture of hadronic beam particle, like the precision measurements of the Standard Model (SM) parameters [1][2][3], and new physics searches. The parton picture follows the factorization theorem of Quantum Chromodynamics (QCD). The parton distribution functions (PDFs) are nonperturbative, therefore cannot be calculated. They are function of the Bjorken-x values (x, momentum fraction) of partons at a momentum transfer scale (Q), which are determined phenomenologically by a global analysis of experimental data from a wide range of physics processes, such as Deep Inelastic Scattering (DIS), Drell-Yan (DY), inclusive jets, and top quark pair production processes. The PDF dependence on Q are determined by the renormalization-group based evolution equations, i.e., DGLAP equation [4][5][6].
Precision measurements of the single W ± and Z gauge boson production cross section 1 at the CERN Large Hadron Collider (LHC) provide important tests on the QCD and the electroweak (EW) sectors of the SM. Theoretical predictions for these cross sections are available up to next-to-next-to-leading order (NNLO) in perturbative QCD [7][8][9][10][11], where one of the dominated systematical uncertainties comes from the PDFs. The CT14 PDFs [12] are the first CTEQ-TEA PDFs that include published results from the ATLAS, CMS, and LHCb collaborations at 7 TeV, including the W ± and Z gauge boson production cross sections and the lepton charge asymmetry measurements from the ATLAS Collaboration [13], the lepton charge asymmetry in the electron [14] and muon channels [15] from the CMS Collaboration, and the lepton charge asymmetry in the decay of W ± -bosons to an electron or a muon, and the Z boson rapidity distribution from the LHCb Collaboration [16]. The ATLAS and CMS measurements primarily impose constraints on the light quark and antiquark PDFs at x 0.01. As studied in Refs. [17,18], the LHCb 7 TeV and 8 TeV W ± and Z boson measurements, though with larger statistical uncertainties as compared to the corresponding results from the ATLAS and CMS experiments, could also impose significant constraints on u and d PDFs.
In the past decades, a large number of experimental results are used in the PDF global analysis, but we still have limited knowledge on the PDFs in very small and very large x 1 Throughout this paper, Z includes both the Z boson and the virtual photon contributions.
ranges. In the single W ± and Z boson production, the x value of interacting partons (x 1 and x 2 ) are correlated with the boson production, through its rapidity (y), as y = 1 2 ln x 1 x 2 . Therefore, the single W ± and Z data in the forward detector region is valuable in the PDF global analysis, as events with larger boson rapidity are produced by partons with small or large x. Correlations between predicted LHCb 13 TeV Z boson production cross section and u, d-quark PDFs as a function of Bjorken-x are shown in Fig. 1. As shown in the figure, the LHCb 13 TeV data is expected to have strong correlations with u-, d-quarks in the small x region, indicate the LHCb 13 TeV W ± and Z data can be used to constrain the corresponding PDFs. The LHCb detector [20,21] is a single-arm forward spectrometer designed for the study of particles containing b or c quarks, covering the pseudorapidity range 2 < η < 5. With a high performance tracking system and a muon sub-detector, the LHCb data can be also extended to the precision EW measurements. Using pp collision data collected at √ s = 7 TeV, the LHCb measured the single W ± and Z boson production cross sections using both muon and electron channel [16,22,23], and the same measurements had been performed using √ s = 8 TeV data [24][25][26]. These results have been used to constrain the PDFs [12,27,28], bring valuable information on the PDF analysis. The √ s = 13 TeV pp collision data sample is collected with a larger center-of-mass energy than previous publications, with more W ± and Z boson events boosted to the forward region, therefore could access even smaller (larger) values of x than the previous 7 and 8 TeV results.
The LHCb detector was running in a reduced luminosities compared to the ATLAS and CMS detectors [29], the reason is that the detector occupancy is extremely high in the forward region. In the LHC Run 2 period (2015-2018), the LHCb detector collected more than 5 fb −1 pp collision data [30] at √ s = 13 TeV. Then, an upgraded detector [31] is foreseen to allow the LHCb detector operation at a luminosity 2 × 10 33 cm −2 s −1 in the LHC Run 3 period (2021-2023), which is five time larger instantaneous luminosity compared to previous. By the end of the LHC Run 4 period (2026-2029), the LHCb detector is expect to collect approximate 50 fb −1 pp collision data [32]. In the future, there is a so-called the LHCb Upgrade-II phase (planned for 2031 data taking) [33], to guarantee the LHCb detector could run in an even higher luminosity (2 × 10 34 cm −2 s −1 ) [34]. After the LHCb detector Upgrade-II, by the end of the LHC operation, the LHCb detector is planning to collect a data sample corresponding to a minimum 300 fb −1 [32]. Therefore, in this article, the pseudo-data samples used for the physics projections are set to either 5 fb −1 or 300 fb −1 .
The article is organized as follows, in Sec. II, we discuss the error PDF updating package ePump and pseudo-data samples used in the analysis. In Sec. III, we study the impacts of the LHCb 13 TeV single W ± and Z boson pseudo-data on CT14HERA2 PDFs [12,35].
In Sec. IV, we discuss the choice of tolerance criteria in ePump update. Our conclusion is given in Sec. V.

II. UPDATING ERROR PDFS
for the single W ± boson production using the 13 TeV LHCb data. Therefore, we shall use pseudo-data in this analysis to emulate the impact of the upcoming LHCb 13 TeV data on the PDFs.
The Monte Carlo events generated with the ResBos generator [40], using the MMHT14 [27] and CT14HERA2 [12,35] PDFs, are taken as the pseudo-data and theoretical predictions, respectively, in this work. The theoretical predictions in this work are computed using the ResBos [40] package at approximate NNLO plus next-to-next-to-leading-logarithm (NNLL) in QCD interaction, in which the canonical scales are used [41,42]. (left) and p T (right), between theory (ResBos) and the LHCb 13 TeV data [39]. The blue points represent the LHCb results in muon channel, and the red points in electron channel, while the black line represents the ResBos prediction.
In the ePump study, for the Z → + − pseudo-data input, the statistical uncertainties are scaled to the 5 fb −1 and 300 fb −1 data sample, separately, by extrapolating the total uncertainty of the LHCb 13 TeV publication [39] to the pseudo-data sample. In the extrapolation, an assumption is made that the ratio of statistical uncertainty to systematical uncertainty will remain the same in all data samples. Similarly, for the W ± → ± ν pseudo-data sample, uncertainties are estimated using the LHCb 8 TeV publication [25,26].
III. IMPACT OF THE LHCB 13 TEV W ± AND Z PSEUDO-DATA ON CT14HERA2

PDFS
In this section, we study the impact of the LHCb 13 TeV single W ± and Z boson pseudodata on the CT14HERA2 PDFs, to demonstrate the LHCb 13 TeV data sensitivity, and to further investigate valuable observable for future measurements.
Since the pseudo-data sample generated with MMHT14 PDFs is used to update the CT14HERA2 PDF sets, and there are differences between the central PDF set of MMHT14 and CT14HERA2, the central value of ePump updated PDFs is varied from the CT14HERA2 central set. In this article, we are interested in variations of PDF uncertainty, and thus we do not discuss variations of PDF central values hereafter.
A. Update from LHCb 13 TeV W ± pseudo-data In the W ± boson leptonic decay, there is a neutrino and a charged lepton in the final state, where the neutrino will escape from detector, only the charged lepton can be detected in a hadron collider experiment. This feature makes a W ± boson analysis become complicated, since irreducible background contribution is difficult to be modeled. On the other hand, the single W ± production rate is one order of magnitude larger than that of the Z boson at the LHCb. If we could model the background properly for the W ± events, such sample with large statistics could allow us to perform many precision measurements. In this study, we used the charged lepton pseudorapidity distribution (η) as an observable in the ePump update, with same binning scheme as the previous LHCb publications [25,26].
After the ePump update, the updated quark PDF distribution is compared with the default one of CT14HERA2. The d quark PDF distribution with its uncertainty is shown in Fig. 3, which indicates the LHCb 13 TeV W ± boson data have large impact on d quark PDF, especially in the small x range from 10 −5 to 10 −3 . With a 300 fb −1 LHCb 13 TeV W ± data sample, the d quark PDF uncertainty can be reduced by a factor of 30% around x = 10 −3 . While the LHCb 13 TeV W ± boson data have smaller impact on the u quark PDF as compared to the d quark PDF. But with a 300 fb −1 data sample, the LHCb 13 TeV W ± boson data have impact on the u quark PDF in the small x region. shows d-quark result using 300 fb −1 W + events, (bottom-right) shows d-quark result using The impacts of the LHCb 13 TeV W + /W − data, the ratio of W + and W − event rates, on the PDF ratios d/u andd/ū are shown in Fig. 4. As we see, even with a 5 fb −1 data sample, the LHCb 13 TeV W + /W − data can already reduce the d/u PDF uncertainty. Most of the improvements are concentrated in the small x region, from 10 −5 to 10 −3 . The 300 fb −1 LHCb 13 TeV data sample could further reduce the uncertainties of both the PDF ratios d/u andd/ū (by about ∼ 20%) in the small x region, as well as some noticeable improvements in the large x region. In the current PDF global fitting, the DIS data provide the largest constraint on the PDF ratio d/u, cf. Ref. [37]. In the future, the LHCb data could provide additional information on the PDF ratios d/u andd/ū.
LHCb pseudo data, 5 fb -1 @13TeV The single Z boson leptonic decay has two charged leptons in the final state, these two charged leptons have large transverse momentum, and are isolated in the detector. Based on these features, the Z → + − events are easy to be reconstructed and identified in a hadron collider, with small background contamination. Therefore, the Z → + − channel is one of best channels to perform the precision EW measurements.
We used the Z boson rapidity distribution as an observable for the ePump update, explored other observables that could be used in the future PDF fitting, and proposed a novel way to present Z boson production measurement that provides more valuable information for PDF fitting. In this study, a binning scheme similar to the previous LHCb publication [39] is used.
The updated PDF results are shown in Fig. 5 for d quark. As shown in the figure, with 5 fb −1 of data sample, the LHCb 13 TeV single Z boson data is not as powerful as W ± data, mainly due to its smaller event rate. While with 300 fb −1 , the Z boson data has impact on d quark PDFs in small x region, from 10 −5 to 10 −2 .
The impacts from the LHCb 13 TeV 300 fb −1 single Z boson data on d/u andd/ū are shown in Fig. 6, where the LHCb Z boson data could reduces the d/u PDF uncertainty in the small x region. (left) shows d-quark result using 5 fb −1 data, (right) shows d-quark result using 300 fb −1 data.
We have also explored the sensitivity of the Z boson p T , lepton cos θ * (defined in Collins-Soper frame [43]), and Z boson rapidity distributions measured at the LHCb to further constrain the PDFs. We consider their impacts one at a time in the ePump update. As shown in Fig. 7, as we see each observable has a slightly different impact on u (d) quark PDFs in all x region, as expected. Below, we propose a better way to extract useful information from the LHCb 13 TeV Z data, by performing a multi-dimensional analysis. With more pp collision data collected by the LHCb detector in the future, it is feasible to perform Z boson production cross section measurement with a multi-dimension binning, like double-, triple-differential cross section measurements. Comparisons of updated PDF uncertainties with different number of (input) experimental observables are shown in Fig. 8. As shown in the figure, we compared the impacts of the LHCb 13 TeV Z boson pseudo-data on PDFs by performing a single differential (Z boson p T , which is labeled as '1D' in the figure), a double-differential (Z boson p T and Z boson rapidity, which is labeled as '2D' in the figure), and a triple-differential (Z boson p T , Z boson rapidity, and lepton cos θ * , and which is labeled as '3D' in the figure) cross section measurements. We found that with limited statistics of Z boson events (5 fb −1 data), the mulit-dimensional measurements cannot significantly improve the PDF determination, as compared to one-dimensional measurements on Z boson p T , lepton cos θ * , and Z boson rapidity, respectively. While using a 300 fb −1 data sample, the mulit-dimensional measurement has better constraints on the PDFs, in all x range. The triple-differential cross section gives the best constraints on u and d quark PDFs, in all x range. The improvements of performing '2D' to '3D' measurement is not as strong as that from '1D' to '2D' measurement. In the experimental point of view, a triple-differential cross section measurement could has limited statistics in extreme phase space, such as boundary of observables. Furthermore, it is complicated to calculate correlated systematic uncertainties in a '3D' measurement, compared to a '2D' measurement. Therefore, in the future with large data sample, a double-dimensional Z boson cross section measurement (a double-differential Z boson p T and Z boson rapidity) is feasible and recommended, which could provides more valuable information in PDF fitting than a single-or a triple-dimensional measurement.
LHCb pseudo data, 300 fb -1 @13TeV In the double-and triple-differential cross section measurements, the binning scheme for Z boson p T , rapidity (y ), and lepton cos θ * are defined as: • Therefore, we checked the impact from the LHCb 13 TeV data on the PDF fitting, including both single W ± boson and Z boson data samples. In reality, as W ± and Z boson results from one experiment are measured with same data sample, therefore, many systematical uncertainties are correlated. In any PDF fitting, correlation matrices between different observable must be provided to avoid potential data bias. In this study, without detector level simulated events, we cannot calculate the correlation matrices between the single W ± and Z boson measurements. So we assume in this study that there is no correlation between the LHCb 13 TeV W ± and Z pseudo-data. In the study, the W ± and Z boson single differential cross section results are used as the inputs of the ePump update, which are the charged lepton η distribution of W ± boson events and the rapidity distribution of Z bosons.
The updated PDF results are shown in Fig. 9 for u-, d-, c-quark and gluon PDFs, and the d/u andd/ū ratio results are shown in Fig. 10, respectively. Based on these figures, the following features are found: • The largest improvement is on the d-quark PDFs. The uncertainty of d-quark PDFs can be improved significantly by the LHCb 13 TeV W ± /Z data in all x region. Special in small x region 10 −5 < x < 10 −2 , the uncertainty would be reduced by a factor of 60% at x ∼ 10 −3 .
• The uncertainty of u-, s-, c-quark and gluon PDFs can be reduced in all x region, and significant improvements in very small and larger x region are expected.
• The uncertainties of d/u andd/ū ratios can be significantly reduced in all x range, even only with 5 fb −1 data. In very larger x region, the LHCb 13 TeV data could have large impact on the d/u ratio.
• The LHCb 13 TeV W ± and Z data also has large impact on theū-andd-quark PDFs, mainly in the small x region.
As to the d/u andd/ū ratios, the future LHCb 13 TeV data will provide the most important constraints on them. In Fig. 10, the 300 fb −1 LHCb 13 TeV pseudo-data provides valuable constraint on d/u ratio in the very large x region (> 0.5) andd/ū in x > 0.2.
In these regions, the LHCb data would be the only clean data, which is free of nuclear corrections as needed when describing the low energy Drell-Yan data to constrain d/u.
It is well-known that fixed-target Drell-Yan measurements provide an important probe of the x dependence of the nucleon (and nuclear) PDFs. This fact has motivated a number of experiments, including the Fermilab E866/NuSea experiment [44], which determined the normalized deuteron-to-proton cross section ratio σ pd 2σ pp out to relatively large x 2 , the momentum fraction of the target. As can be seen based upon a leading-order quark-parton model analysis, this ratio is expected to have especially pronounced sensitivity to the x dependence of the PDF ratiod/ū. The E866 results stimulated an interest in performing a similar measurement out to larger x 2 with higher precision -the main objective of the subsequent SeaQuest/E906 experiment at Fermilab [45], from which results are expected soon. The LHCb data could be used to check the impact from SeaQuest [46] result ond/ū in the large x region. In Fig. 11, we compare the theoretical prediction based on the update CT14HERA2 PDFs (with the 300 fb −1 LHCb 13 TeV combined W ± and Z pseudo-data as At tree level, the Z boson is produced via qq annihilation, where q could be u, d, c, s, and b. While the W ± boson could produced via us andūs. Therefore, the ratio of W ± distribution to that of Z boson could be sensitive to the s+s u+d at the first order [19]. With a uniform binning (18 pseudorapidity/rapidity bins, from 2.0 to 4.5), a (W + + W − )/Z ratio in each bin is calculated, where muon pseudorapidity of W ± boson and rapidity of Z boson are used. Correlations between the predicted LHCb 13 TeV (W + + W − )/Z ratio showsd/ū result using 300 fb −1 data. and s+s u+d are shown in Fig. 12. With calculated (W + + W − )/Z ratio as ePump input, we checked the impact on s+s u+d from the LHCb single W ± and Z data, as shown in Fig. 12. With 5 fb −1 LHCb 13 TeV pseudo-data as input, the W + +W − Z data does not have visible impact on the PDF ratio s+s u+d . While with 300 fb −1 LHCb 13 TeV pseudo-data as input, the impact on the PDF ratio s+s u+d becomes significantly in the x range of 10 −2 to 10 −1 , which could be used to precisely determine the strange quark PDFs in the future. As expected, larger correlations between (W + + W − )/Z and s+s u+d are seen in the same x range. For comparison, the higher-x 2 portion of the older E866 data [44] (blue diamonds) is also presented here.

IV. TOLERANCE CRITERIA IN EPUMP UPDATE
The tolerance criterion (i.e., the choice of total ∆χ 2 value in a global analysis) is an important parameter in PDF fitting.
It was extensively discussed in Ref. [47] that in order to best reproduce CT14HERA2 global fit, one should use dynamical tolerance in ePump. If the tolerance is set to be ∆χ 2 = 1 at 68% confidence level (CL), or equivalently, (1.645) 2 at 90% CL, instead of using a dynamical tolerance, a very large weight is effectively assigned to the new input data, when updating the existing PDFs in the CT PDF global analysis framework.
To illustrate differences between the ePump updated PDFs using a dynamical tolerance and a fixed tolerance with ∆χ 2 = (1.645) 2 , we used the single LHCb 13 TeV Z data as the ePump input, the result is shown in Fig. 13. The impact from the new data are enhanced when update the PDFs with ∆χ 2 = (1.645) 2 , which could introduce bias in the new updated PDF set.
This conclusion also holds for using MMHT2014 [27] and PDF4LHC15 [48] PDFs in profiling analysis to study the impact of a new (pseudo-) data on updating the existing PDFs. In this article, we studied the potential of the LHCb 13 TeV single W ± and Z boson pseudo-data on constraining the PDFs in proton. As an example, we demonstrated the sensitivity of the LHCb 13 TeV data, collected with an integrated luminosity of 5 fb −1 and 300 fb −1 , respectively, to reducing the PDF uncertainty bands of the CT14HERA2 PDFs.
We have also investigated the sensitivities of various experimental observables.
The large impact from the LHCb 13 TeV data on various quark flavor PDFs in all x region had been seen, and significant contributions in small x (< 10 −3 ) region are expected.
Particularly, the d andd quark PDF uncertainties are reduced dramatically, ∼60% improvement at momentum fraction (x) around 10 −3 . Due to its large event rate, the LHCb 13 TeV W + /W − data can already further reduce the d/u PDF uncertainty, even with only a 5 fb −1 data sample, as shown in Fig. 4. Most of the improvements are concentrated in the small x region, from 10 −5 to 10 −3 . The 300 fb −1 LHCb 13 TeV data sample could further reduce the uncertainties of both the PDF ratios d/u andd/ū (by about ∼ 20%) in the small x region, as well as some noticeable improvements in the large x region which is currently dominated by DIS data in global analysis.
Although with a smaller event rate as compared to the W ± events, the LHCb 13 TeV Z data can also provide important constraints on PDFs, particularly when considering a double-differential distribution (as a function of Z boson p T and rapidity), as the experimental observable for updating the existing PDFs. We have compared the impact of the LHCb 13 TeV Z boson pseudo-data on PDFs by performing a single differential (Z boson p T ), a double-differential (Z boson p T and Z boson rapidity), and a triple-differential (Z boson p T , Z boson rapidity, and lepton cos θ * ) cross section measurements. We found that with limited statistics of Z boson events (5 fb −1 data), the mulit-dimensional measurements cannot significantly improve the PDF determination, as compared to one-dimensional measurements on Z boson p T , lepton cos θ * , and Z boson rapidity, respectively. While using a 300 fb −1 data sample, the mulit-dimensional measurement has better constraints on the PDFs, in all x range.
It is evidently that the combined data sample of W ± and Z boson events can provide further constraints on PDFs. Examining the ratio of the event rates W + +W − Z could directly probe the PDF ratio s+s u+d , as previously noted in Ref. [19]. With 5 fb −1 LHCb 13 TeV pseudodata as input, the W + +W − Z data does not have visible impact on the PDF ratio s+s u+d . While with 300 fb −1 LHCb 13 TeV pseudo-data as input, the impact on the PDF ratio s+s u+d becomes significantly in the x range of 10 −2 to 10 −1 , which could be used to precisely determine the strange quark PDFs in the future. The above features suggest that the LHCb single W ± and Z data taken at the LHC 13 TeV will provide very important and unique information in a global analysis, complementary to the ATLAS and CMS results. With an integrated luminosity of 300 fb −1 collected data in the future, the impact from the LHCb 13 TeV data on the PDF fitting could be enhanced significantly, by performing a multi-differential cross section measurement.
Before concluding our work, we also pointed out the important role of tolerance criteria in the PDF updating. As discussed in Ref. [47], one should use dynamical tolerance in ePump. Setting tolerance to be 1 at 68% CL, or equivalently, (1.645) 2 at 90% CL, will greatly overestimate the impact of a given new data set when updating the existing PDFs in the CT PDF global analysis framework. This conclusion also holds for using MMHT2014 [27] and PDF4LHC15 [48] PDFs in profiling analysis to study the impact of a new (pseudo-) data on updating the existing PDFs.