Reduction of PDF uncertainty in the measurement of the weak mixing angle at the ATLAS experiment

We investigate the parton distribution function (PDF) uncertainty in the measurement of the effective weak mixing angle $\sin^2\theta_{\text{eff}}^{\ell}$ at the CERN Large Hadron Collider (LHC). The PDF-induced uncertainty is large in the proton-proton collisions at the LHC due to the dilution effect. The measurement of the Drell-Yan forward-backward asymmetry ($A_{FB}$) at the LHC can be used to reduce the PDF uncertainty in the $\sin^2\theta_{\text{eff}}^{\ell}$ measurement. However, when including the full mass range of lepton pairs in the $A_{FB}$ data analysis, the correlation between the PDF updating procedure and the $\sin^2\theta_{\text{eff}}^{\ell}$ extraction leads to a sizable bias in the obtained $\sin^2\theta_{\text{eff}}^{\ell}$ value. From our studies, we find that the bias can be significantly reduced by removing Drell-Yan events with invariant mass around the $Z$ pole region, while most of the sensitivity in reducing the PDF uncertainty remains. Furthermore, the lepton charge asymmetry in the $W$ boson events as a function of the rapidity of the charged leptons, $A_\pm(\eta_\ell)$, is known to be another observable which can be used to reduce the PDF uncertainty in the $\sin^2\theta_{\text{eff}}^{\ell}$ measurement. The constraint from $A_\pm(\eta_\ell)$ is complementary to that from the $A_{FB}$, thus no bias affects the $\sin^2\theta_{\text{eff}}^{\ell}$ extraction. The studies are performed using the Error PDF Updating Method Package ({\sc ePump}), which is based on the Hessian updating methods. In this article, the CT14HERA2 PDF set is used as an example.


I. INTRODUCTION
Measurement of the leptonic effective weak mixing angle, θ ℓ eff , is one of the most important topics in experimental particle physics. It is the key parameter in electroweak global fitting. It played a crucial role in predicting the mass of the Higgs boson with a precision of O(10) GeV. Going forward, it will continue to contribute in the global fittings, and will aid in tests of the standard model and in searches for potential new physics beyond the standard model. At an energy scale of the Z boson mass (M Z ), sin 2 θ ℓ eff can be determined from measurements of parity-violation in the neutral-current processes of fermion-antifermion scattering, f ifi → Z/γ * → f jfj . One such measurement is the forward-backward asymmetry (A F B ), defined as where N F and N B are the numbers of forward and backward events. At lepton colliders, forward and backward events are defined according to the sign of cos θ, where θ is the scattering angle between the outgoing fermion f j and the incoming fermion f i . The most precise determinations to date of sin 2 θ ℓ eff at the Z-pole are provided by the LEP and SLD Collaborations [1], giving a combined result of 0.23153 ± 0.00016. The precisions of these measurements, achieved at the last generation of e + e − colliders, are limited by statistical uncertainties. Subsequent to the LEP/SLD era, measurements have been made at hadron collider experiments, i.e., the proton-antiproton collider, Tevatron, and the proton-proton collider, CERN Large Hadron Collider (LHC), using A F B in the final states of Drell-Yan (DY) pp/pp → Z/γ * → ℓ + ℓ − processes, as a function of the di-lepton pair invariant mass. At hadron colliders, forward and backward events are defined in the Collins-Soper (CS) [2]. This is a special rest frame of the lepton-pair, with the polar and azimuthal angles defined relative to the two hadron beam directions. The z axis is defined in the Z boson rest frame so that it bisects the angle formed by the momentum of either of the incoming hadron and the negative of the momentum of the other hadron. The cosine of the polar angle θ * is defined by the direction of the outgoing lepton l − relative to theẑ axis in the CS frame and can be calculated directly from the laboratory frame lepton quantities by where the scalar factor c (either 1 or -1) is defined for the Tevatron and the LHC, respectively, as c = 1, for the Tevatron p Z,ll /| p Z,ll |, for the LHC .
And thus, the sign of the z axis is defined as the proton beam direction for the Tevatron, and on an event-by-event basis as the sign of the lepton pair momentum with respect to the z axis in the laboratory frame for the LHC. The variables p Z,ll , m ll , and p T,ll denote the longitudinal momentum, invariant mass and transverse momentum of the dilepton system, respectively, and, where the lepton (anti-lepton) energy and longitudinal momentum are E 1 and p Z,1 (E 2 and p Z,2 ), respectively. The DY events are therefore defined as forward (cos θ * CS > 0) or backward (cos θ * CS < 0) according to the direction of the outgoing lepton in this frame of reference.
Compared to the lepton-collider cases, measurements at hadron colliders suffer from additional uncertainties on modeling the directions of the incoming fermions and antifermions in the initial state. Such uncertainties will dilute A F B and reduce the sensitivity for the determination of sin 2 θ ℓ eff . The degree of the dilution at hadron colliders is modeled by parton distribution functions (PDFs). At the Tevatron, fermions in the initial state of DY production are dominated by valence quarks. This allows us to make an assumption that the incoming quark of DY production is moving along the proton beam direction, as indicated in Eq. 3, while the direction of the incoming anti-quark is along the anti-proton beam. However, contribution from sea-quark interactions is still as large as about 10% at the Tevatron. The uncertainty of this dilution fraction, which is calculated using PDFs, will propagate into the uncertainty estimation of the sin 2 θ ℓ eff measurement extracted form A F B distribution. The combination of the D0 and CDF measurements at the Tevatron gives a result of 0.23179 ± 0.00030(stat) ± 0.00017(PDF) ± 0.00006(syst) [3], which shows a non-negligible PDF-induced uncertainty.
The PDF dilution effect is even more significant at the LHC, since it is a proton-proton collider. Due to its completely symmetrical initial state, there is an equal probability of finding the incoming quark of DY production from either of the two proton beams. In order to distinguish forward from backward events in pp collisions, the beam pointing to the same hemisphere as the Z boson reconstructed from final state leptons, is assumed to be the one which provides the quark. This is motivated by the observation that the valence quarks inside the protons generally carry more energy than the antiquarks (or sea quarks) inside the protons. However, this assignment is only statistically correct, because it is possible for the sea quarks to have a larger fraction of momentum (x) of the incoming proton than the valence quarks. Furthermore, beyond the leading order in the QCD interaction, quark-gluon and antiquark-gluon processes will contribute at the next-to-leading order (NLO), and the gluon-gluon process will contribute at the next-to-NLO (NNLO). These all affect the PDF dilution factor, whose magnitude depends on the precise modeling of the momentum spectra of all flavors of quarks and gluons involved in the Drell-Yan processes, which is more complicated than just modeling the total cross sections of valence quarks and sea quarks for the protonantiproton case. Consequently, the PDF-induced uncertainty in the A F B measurement at the LHC is significantly larger than that at the Tevatron. The latest published measurement from the CMS collaboration gives a result of 0.23101 ± 0.00036(stat) ± 0.00031(PDF) ± 0.00024(syst) [4], in which the PDF uncertainty is about the same size as the statistical uncertainty.
In the future high luminosity (HL) LHC era, the statistical uncertainty will be reduced as data accumulates. Thus, the PDF uncertainty will become the leading uncertainty that limits the precision in the determination of sin 2 θ ℓ eff . Studies have been done in the literature to discuss how to further reduce the PDF uncertainties relevant for precision electroweak measurements at the LHC [5]. Two experimental observables are essential to this task: one is the A F B of the DY pairs and the other is the lepton charge asymmetry A ± (η ℓ ) in the W ± boson events. When A F B is used to simultaneously determine sin 2 θ ℓ eff and to reduce PDF uncertainties, it will inevitably bring correlations. Such correlations have not been systematically considered in previous studies, as it is not expected to be important when the PDF-induced uncertainty does not dominate the overall uncertainty. In this article, we investigate the correlation between the two tasks of further reducing the PDF uncertainty and performing the precision determination of sin 2 θ ℓ eff from measuring the same experimental observable A F B . We demonstrate the potential bias on the sin 2 θ ℓ eff determination, and discuss possible solutions for the future LHC measurements. The paper is organized as follows: in Section II, a brief review on using new data to update PDFs and to reduce the related uncertainties is presented; in Section III, we perform an exercise in updating the PDFs with A F B at the LHC, and demonstrate its potential bias on the sin 2 θ ℓ eff determination; in Section IV, we study how updating the PDFs with the lepton charge asymmetry A ± (η ℓ ) measured at the LHC reduce the PDF uncertainties. In Section V, we study the implications of updating the PDFs with both A F B and A ± (η ℓ ) data, and apply the ePump-optimization procedure to illustrate the complimentary roles of the sideband A F B and A ± (η ℓ ) observables in reducing the PDF uncertainty, and then to make the optimal choice on the bin size of the experimental data used in the PDF-updating analysis; finally, a summary is presented in Section VI.  The two most commonly-used methods for extracting PDFs and their uncertainties from a global analysis of high-energy scattering data, are the Monte Carlo method, used by NNPDF [6], and the Hessian method, used in CT14HERA2 [7,8], for example. In the Monte Carlo method, a statistical ensemble of PDF sets are provided, which are assumed to approximate the probability distribution of possible PDFs, as constrained from the global analysis of the data. In the Hessian method, a smaller number of error PDF sets are provided along with the central set which minimizes the χ 2 -function in a global analysis. These error PDF sets correspond to the positive and negative eigenvector directions in the space of PDF parameters. The most complete method for obtaining constraints from the new data on the PDFs would be to add the new data into the global analysis package and to do a full re-analysis. However, this is impractical for most users of the PDFs. A technique for estimating the impact of new data on the PDFs, without performing a full global analysis, is very useful. In the context of the Monte Carlo PDFs, the PDF reweighting method has become commonplace. This involves applying a weight factor to each of the PDFs in the ensemble [9][10][11] when performing ensemble averages. The PDF updating procedure will reduce the overall effective number of PDF replica in the ensemble. The impact of new data can also be estimated directly using Hessian PDFs [12][13][14], where it is called Hessian profiling. It updates the eigenvectors within the Hessian approximation, which is faster and simpler. Note that both the Monte Carlo method and the Hessian profiling are based on the original Monte Carlo PDFs or error sets, respectively. Therefore, the new data is assumed to be in general consistent with the PDF predictions before updating, so that the updated best-fit PDF set is not too different from the original best fit. If a large deviation is found between the new data and the original theory predictions, a full analysis of PDF global fitting is needed.
The theoretical predictions in this work are computed using the ResBos [15] package at the next-to-leading order (NLO) plus the next-to-next-leading log (NNLL) in QCD, in which the canonical scales are used [16,17]. The CT14HERA2 central and error PDFs [7,8] are used in this analysis. A F B as a function of dilepton mass (M ℓℓ ) at LHC is sensitive both to sin 2 θ ℓ eff and to PDF modeling. FIG. 1 shows the A F B distributions of two separated sin 2 θ ℓ eff values of 0.2315 and 0.2345, their difference, and the PDF uncertainties as functions of the di-lepton invariant mass for √ s = 13 TeV pp collisions at the LHC. The two values of sin 2 θ ℓ eff are arbitrarily chosen to be far separated in order to clearly reveal their different predictions of A F B . When A F B from a new data set is used in the PDF updating procedure, it is assumed to be consistent with the current theory predictions. This means that sin 2 θ ℓ eff , on which A F B depends, is considered to have the same value as determined from existing experimental measurements, even if a different value of sin 2 θ ℓ eff is used in generating the pseudo-data. As a result, a simple PDF updating procedure will forcibly absorb the difference in sin 2 θ ℓ eff into the PDFs, which will bias the determination of both the updated PDFs and the extracted sin 2 θ ℓ eff . The size of the bias depends on how large is the difference between the current accepted value of sin 2 θ ℓ eff (used in the theory prediction) and the value used in the generation of the pseudo-data, which will be quantitatively discussed in the following sections. An important thing to note is that A F B is more sensitive to sin 2 θ ℓ eff in the Z pole region, while the PDF-induced uncertainty becomes more significant when M ℓℓ moves to higher or lower regions. The difference in sensitivities of the regions suggests a method to reduce the correlations. The work presented in Ref. [5] was done using the Monte Carlo reweighting method, with NNPDF PDFs, and was based on the hypothesis that the above-mentioned correlation is negligible. In this work, we instead use the software package ePump (error PDF Updating Method Package), which can update any given set of Hessian PDFs obtained from an earlier global analysis [18].

III. UPDATING THE PDFS WITH AF B DATA
In this section, we quantitatively examine how the PDF-induced uncertainty in the determination of sin 2 θ ℓ eff could be reduced by applying the Hessian updating method, via ePump, and study the correlation mentioned above. First, we consider the case of using the A F B data spanning the full range of M ℓℓ , from 60 GeV to 130 GeV. Second, we consider the case of using only the A F B sideband spectrum, where the events with M ℓℓ from 80 GeV to 100 GeV are excluded.
In order to perform the PDF update, ePump requires two sets of inputs: data templates and theory templates. The data templates provide A F B distributions with their uncertainties. The theory templates consist of the theory predictions for the A F B from the original PDF error sets. The output of ePump consists of an updated central and Hessian eigenvector PDFs, representing the result that would be obtained from a full global re-analysis that includes the new data. As an additional benefit, ePump can also output the updated predictions and uncertainties for any other observables of interest without the necessity to recalculate using the updated PDFs. For more details about the use of ePump, see Ref. [12].
For the DY samples, each lepton flavor channel of electron and muon has 250 million events in the mass range of 60 GeV ≤ M ℓℓ ≤ 130 GeV. This sample size corresponds to an integrated luminosity of roughly 130 fb −1 , which is the size of the total data collected by the ATLAS detector during the LHC Run 2. The pseudo-data is modeled using the CT14HERA2 central PDFs. Nominal theory-template samples consist of the central and (56) error PDF predictions, generated using the CT14HERA2 error PDFs sets. In the theory templates, sin 2 θ ℓ eff is set to be 0.2315, which is the value determined by the LEP and SLD Collaborations. In the pseudo-data, sin 2 θ ℓ eff is set to be 0.2324 in order to examine the effects of an offset or pull in the new data. The difference is deliberately chosen to be 3 times the uncertainty of the sin 2 θ ℓ eff measurement as determined at hadron colliders [3]. To mimic the experimental acceptance, a set of ATLAS detector-like event selections are further applied to the pseudo-data and the nominal theory samples: • Each lepton is required to have its transverse momentum p T ≥ 25 GeV.
• Events are denoted as CC (central-central) if both leptons have |η ℓ | ≤ 2.5, and CF (central-forward) if one lepton has |η ℓ | ≤ 2.5 and the other has 2.5 < |η ℓ | < 4.9. The CC events correspond to doubling the integrated luminosity with respect to the CF events, since both the dielectron and dimuon channels contribute to the CC events, while only the dielectron channel has CF events at the ATLAS detector.
• The forward-backward asymmetry A F B is measured in a 2 GeV mass bin size.
Note that the pseudo-data were treated as coming from just one "experiment", but in practice both ATLAS and CMS would be sources of input data for fitting.
A. Updating PDFs with AF B using the full mass range First, we will update PDFs using full mass range A F B . It is expected that the PDF-induced uncertainty on A F B (M ℓℓ ) will be reduced after updating the original PDFs with the inclusion of the pseudo-data. Note that the pseudo-data and theory prediction are generated by the same CT14HERA2 PDFs. If the correlation between the sin 2 θ ℓ eff and the PDF updating is negligible, we expect no changes in the central value of A F B as predicted by the PDFs after updating, compared to that given in the original theory prediction.
The predicted A F B distributions (sin 2 θ ℓ eff = 0.2315) and the associated PDF-induced uncertainties, before and after the PDF updating by using the full mass range of the pseudo-data (sin 2 θ ℓ eff = 0.2324), are depicted in Figs. 2 and 3 for the CC and CF event samples, respectively. As shown in the bottom panels of the figures, the PDF-induced uncertainties on the predicted A F B are significantly reduced after the updating procedure. This finding is consistent with the conclusion of Ref. [5]. However, as also shown in the middle panels of the figures, the central values of A F B    differ before and after the updating, particularly at the Z pole region. The difference is more significant in the CF than CC events. When the sin 2 θ ℓ eff of the pseudo-data has a value different from its value in the theory predictions, the existing PDF model (i.e., CT14HERA2 PDFs, in this study) no longer describes the new data in a consistent way. As a result, the PDF updating procedure would forcibly convert this bias, which originated from a different value of sin 2 θ ℓ eff , into an updated central PDF set. The averaged A F B values at the Z-pole region in the pseudo-data and theory predictions, before and after PDF updating, are numerically presented in Tables I, II and III, for the CC, CF and CC + CF events. The CF events have higher sensitivity to the A F B . For example, as given in the third line of Table II, the PDF uncertainty can be decreased from 0.00118 to 0.00055, a reduction of more than 50%. Meanwhile, the bias on A F B , originating from the PDF updating, can be as large as ∆ = −0.00108, as shown in the same line of the table. As we pointed out, there should be no difference in the central value of A F B before and after an unbiased updating, because the pseudo-data and theory templates are generated with same PDF sets. This bias, which is larger than the statistical uncertainty shown in the third column, indicates that much of the effects of the shift in sin 2 θ ℓ eff have been absorbed into the updated PDFs.   To estimate the impact on the determination of sin 2 θ ℓ eff in the Z-pole mass region, we express the average A F B approximately as a linear function of sin 2 θ ℓ eff in this region, written as where the values of the parameters k and b are listed in Table IV, for CC, CF and CC+CF event samples, respectively. One could roughly estimate the bias and the PDF-induced uncertainty on the determination of sin 2 θ ℓ eff , derived from the biased A F B measurement, using the following simplified relation: From the above equation and Table III, we obtain the results listed in Table V.    It can be seen that the bias on sin 2 θ ℓ eff determined from the biased A F B after the PDF updating is much larger than the PDF-induced uncertainty itself, especially in the CF event sample which is more sensitive to sin 2 θ ℓ eff than the CC event sample. Of course, this bias depends on the difference between sin 2 θ ℓ eff values in the pseudo-data and the original theory prediction. And in this work, it is intentionally set to an exaggeratedly large difference of 0.0009 for illustration, which is 3 times the uncertainty obtained from the best hadron collider measurements. A smaller difference between the sin 2 θ ℓ eff value of the pseudo-data and the world average value would surely lead to a smaller bias in the A F B measurement after the PDF updating procedure. Nevertheless, this part of our study clearly demonstrates the fact that using the full mass spectrum of the A F B data to update the existing PDFs will introduce bias in the determination of sin 2 θ ℓ eff at the Z-pole mass region. With more data collected at the future high luminosity LHC, the weak mixing angle can be determined more precisely, and the sin 2 θ ℓ eff measurements with different lepton final states of DY processes at the ALTAS, CMS and LHCb experiments should be considered as separate measurements. Occasionally, one might expect some individual sin 2 θ ℓ eff measurements to exhibit significant deviations from the nominal world average value. In such circumstances, the potential bias on the sin 2 θ ℓ eff extraction, induced by updating PDFs with the A F B measurement spanning the full mass range, from 60 GeV to 130 GeV, should be seriously considered.
The bias incurred by updating the PDFs using the full mass spectrum can also be seen by looking directly at the PDFs of the quarks and gluons themselves. Fig. 4 depicts the comparison of u and d quark PDFs before and after the updating. With an unbiased updating procedure, the central PDF values of the two PDF sets (before and after the PDF updating) should be unchanged, while the updated PDF uncertainties are expected to be reduced after the inclusion of the new pseudo-data. This feature, however, is not confirmed in Fig. 4. Again, this displays how the biased updated PDFs have been changed in order to compensate for the effects of the shifted sin 2 θ ℓ eff in the pseudo-data. B. Updating PDFs using sideband AF B data only As shown in FIG. 1, the A F B asymmetry is more sensitive to sin 2 θ ℓ eff when M ℓℓ is around the Z-pole mass, while the PDF-induced uncertainty becomes more significant when M ℓℓ is outside the Z pole mass window. This is because that in the Z pole region, the asymmetry is proportional to both the vector and axial-vector couplings of the Z boson to the fermions and is numerically close to 0. And since only the vector coupling of the Z boson depends on the weak mixing angle, the information on sin 2 θ ℓ eff predominantly comes from the A F B in the vicinity of the Z-boson pole. While away from the Z-boson mass pole, the asymmetry results from the interference of the axial vector Z coupling and vector photon coupling and depends upon the PDFs. On the other hand, the sensitivity of constraining the PDFs via a measurement of A F B depends on the value of the asymmetry (see Appendix A). Consequently, the A F B -to-PDF sensitivity is suppressed in the Z pole region where the value of the asymmetry is close to zero, and is enhanced outside the Z pole mass window with magnified A F B value. This observation suggests that we could separate the A F B distribution into Z pole region and sideband region, and use them for sin 2 θ ℓ eff determination and PDF updating procedure, respectively. This procedure could reduce the correlation, and keep most of the sensitivities.
To confirm this, we generate and use a new pseudo-data sample with an even more different value of sin 2 θ ℓ eff (as = 0.2345) in this section, i.e., the difference between sin 2 θ ℓ eff values in the pseudo-data and the original theory templates is 10 times the best precision at hadron colliders. When this new pseudo-data sample was generated, Z pole events with M ℓℓ from 80 to 100 GeV were explicitly excluded. Following the same analysis procedures as discussed in the previous section, we obtain the numerical results listed in Tables VI, VII    spectrum from the CF events of pseudo-data (sin 2 θ ℓ eff = 0.2345). Statistical uncertainty corresponds to the data sample with an integrated luminosity of 130 fb −1 .
Since the inclusive production rate of the Z boson is dominated by the contribution from the Z-pole mass window, the constraint on the PDF uncertainty obtained from using only the sideband A F B data sample is not as statistically powerful as that using the full mass range A F B data sample. For example, comparing the sideband result (in Table VIII) to the full mass range result (in Table III), we find that the PDF uncertainty only reduces to 0.00072 for sideband updating, compared to 0.00054 for full mass range updating, in the case of using the most sensitive CF event sample. But, on the other hand, the bias on the average A F B in the Z-pole mass window is much smaller in the sideband updating (with ∆ = −0.00047 and sin 2 θ ℓ eff = 0.2345) than that in the full mass range updating (with  spectra from both the CC and CF events of pseudo-data (sin 2 θ ℓ eff = 0.2345). Statistical uncertainty corresponds to the data sample with an integrated luminosity of 130 fb −1 . ∆ = −0.00115 and sin 2 θ ℓ eff = 0.2324), as listed in the same tables. Furthermore, in contrast to the strong variation observed in Fig. 4, we find much less bias on various parton flavor PDFs when using only the sideband A F B data to update the PDFs, as shown in Fig. 5. By using the numbers from Table VIII, the impact of updating PDFs with the sideband A F B data on the determination of sin 2 θ ℓ eff can be summarized in Table IX. In comparison to the result of using the full mass range A F B data in the updating, cf. Table V, the PDF-induced uncertainty from using only the sideband A F B data sample increases by about 20-30%, but the biases on sin 2 θ ℓ eff diminish dramatically, despite using the much larger value of sin 2 θ ℓ eff = 0.2345 in the present case. Since the bias introduced by using the full mass range A F B updating is apparently larger than the PDF-induced uncertainties, to reduce the bias on sin 2 θ ℓ eff by using the sideband A F B updating should have higher priority than to keep the statistical uncertainty 20% ∼ 30% smaller. One should optimize the mass window for a specific measurement to have better balance between bias and sensitivities.  Considering the fact that the new data might not correspond to a value of sin 2 θ ℓ eff as large as 0.2345, we could conclude that by the end of the LHC Run 2, the potential bias of using the sideband A F B data in the PDF updating should be small. However, this does not mean one can ignore it. As we have seen, by using the sideband A F B in the PDF updating one can reduce the effects of any potential bias, while not significantly enlarging the total uncertainty of the sin 2 θ ℓ eff determination. Furthermore, we still strongly suggest keeping the PDF updating as a preliminary method to improve the sin 2 θ ℓ eff measurement. A final determination of sin 2 θ ℓ eff and its uncertainty estimation can only be reliably provided by a full global analysis, which includes new data sets and allows a thorough study on adding new degrees of freedom in the nonperturbative PDF parameters, etc. Experimental results should also be provided in a proper format, allowing theorists to replace the preliminary PDF updating method employed in the experimental measurement by a consistent global analysis.

IV. UPDATING PDFS USING LEPTON CHARGE ASYMMETRY A±(η ℓ ) IN W PRODUCTION
In this section, we investigate the advantage of using the asymmetry in the rapidity distribution of the charged leptons from W → lν boson decays, produced at the LHC, to update the PDFs. In pp collisions, W + and W − have different cross sections, and accordingly an asymmetry can be defined as a function of the final state charged lepton rapidity η ℓ : This asymmetry is caused by the difference between up and down type quark and their anti-quark distributions in the proton, and thus provides complementary information to A F B in constraining the PDFs. Although using the A ± (η ℓ ) as input is essential to many other PDF constraints, it has less impact on the sin 2 θ ℓ eff measurements, compared to using the A F B in the PDF updating. In general, the A ± (η ℓ ) is an initial state asymmetry, directly reflcting the difference between W + and W − production rates at the LHC, and has little dependence on the weak interaction decays.
To study the impact of the A ± (η ℓ ) on reducing the PDF-induced uncertainty in the A F B measurement, we generate a set of W boson samples, in which the sin 2 θ ℓ eff value is taken to be different from the original theory templates, as done in the previous DY case. To model the ATLAS acceptance, the charged leptons (electrons and muons) from the W boson decay are required to have their |η ℓ | < 2.5. Forward electrons are usually removed from the single W production measurement, due to difficulties in controlling the backgrounds in the high rapidity region. Both charged leptons and neutrinos are required to have p T > 25 GeV. A bin size of 0.1 on |η ℓ | is used in the A ± (η ℓ ) distributions. The A ± (η ℓ ) distributions, together with the PDF-induced uncertainties, before and after the PDF updating, are shown in Fig. 6.
The values of the average A F B and their PDF-induced uncertainty after updating PDFs with the simulated lepton charge asymmetry pseudo-data are listed in Table X. The PDF-induced uncertainty on the average A F B is reduced by 17% for CC, and 13% for CF events, after updating PDFs with the A ± (η ℓ ) data. The central prediction for A F B does not change after updating PDFs with the A ± (η ℓ ) data, since there is no direct correlation between the value of sin 2 θ ℓ eff and A ± (η ℓ ). Fig. 7 depicts the comparison of d, u v − d v , d/u and (ud − dū)/(ud + dū) PDFs, between the nominal CT14HERA2 and the updated PDFs with the inclusion of the A ± (η ℓ ) data. It shows that the potential bias on the central values of the PDFs is negligible, while noticeable reduction of the PDF uncertainty can be clearly observed in some relevant x ranges, depending on the parton flavor.   Average AF B at Z pole region before and after the PDF updating. The PDF updating is done using A±(η ℓ ) from W boson production.
Statistical uncertainty corresponds to the data sample with an integrated luminosity of 130 fb −1 .

V. UPDATING PDFS USING BOTH SIDEBAND AF B AND LEPTON CHARGE ASYMMETRY A±(η ℓ )
Since the Drell-Yan A F B and the lepton charge asymmetry A ± (η ℓ ) provide complementary information, it is expected that the PDF-induced uncertainty in the determination of sin 2 θ ℓ eff can be further reduced if we use both the sideband A F B and the A ± (η ℓ ) data together to update the PDFs. Applying the same analysis to those two pseudodata sets, as detailed in the previous sections, we obtain the results listed in Table XI, which should be directly  compared to Table IX for using the sideband A F B data and Table X for using the A ± (η ℓ ) data alone, respectively. We find that using both data sets to update the PDFs could further reduce the PDF-induced uncertainty on the determination of sin 2 θ ℓ eff , which is determined using the A F B data in the Z-pole mass window, by about 28% as compared to that using only the sideband A F B data.

Update PDF using
Potential bias PDF uncertainty sideband AF B and A±(η ℓ ) on sin 2 θ ℓ eff on sin 2 θ ℓ eff CC events: 0.00009 0.00032 CF events: 0.00018 0.00021 The bias and the PDF-induced uncertainty on sin 2 θW , after updating the PDFs with the sideband range of the AF B pseudo-data (with sin 2 θW = 0.2324) and theA±(η ℓ ) pseudo-data samples, for the CC and CF event samples, respectively. The PDF uncertainties are given at the 68% C.L.

VI. AN APPLICATION OF EPUMP-OPTIMIZATION
To further discuss the improvement in PDF-induced uncertainties, we apply the ePump-optimization method of the ePump code in this section, in the combined analysis of the A F B and A ± (η ℓ ) data, (1) to demonstrate their complimentary roles in reducing the PDF uncertainty in the PDF-updating procedure; and (2) to investigate the optimal choice of bin size for studying the PDF-induced uncertainty in experimental observables related to those two individual data.

A. Complimentary roles in reducing the PDF uncertainty
The ePump-optimization (or PDF-rediagonalization) method is based on ideas similar to that used in the data set diagonalization method developed by Pumplin [19].
For a set of new data points, the application constructs an equivalent set of eigenvector, which are orthogonal to each other in the PDF fitting parameter space, by re-diagonalizing the original Hessian error PDFs with respect to the given data. The total uncertainty calculated by the new eigenvectors is exactly identical to that calculated with the original error PDFs in the linear approximation assumed by the Hessian analysis. But, in addition, the new error PDF pairs are ordered by the magnitudes of their re-calculated eigenvalues, the sum of which should be identical to the total number of the given data points, as noted in Ref. [12]. That is to say the new eigenvectors can be considered as projecting the original error PDFs to the given data set, and be optimized or re-ordered so that it is easy to choose a reduced set that covers the PDF uncertainty for the input data set to any desired accuracy [12].
As an example, after applying the ePump-optimization method to the CT14HERA2 PDFs for the sideband A F B and A ± (η ℓ ) data sets, which contain 50 data points in total (i.e., 25 bins in each case), we find that the top three new eigenvector pairs predominantly have eigenvalues of 25.2, 18.1 and 5.5, respectively, while the eigenvalues of remaining ones decrease rapidly after that. The combination of these top 3 optimized error PDFs contributes up to 97.6% in the total PDF variance of the 50 given data points. This ePump-optimization allows us to conveniently use these 3 leading new eigenvectors, in contrast to applying the full 56 error sets of the CT14HERA2, to study the PDF-induced uncertainty on A F B and A ± (η ℓ ) or any other observable that is directly related to them.
Relative contributions of the top 3 leading optimized eigenvectors to the PDF uncertainties of the sideband A F B and A ± (η ℓ ), normalized to each bin for illustration, are shown in Fig. 8, respectively. One can see directly that the first eigenvector (labeled as EV01) gives by far the largest contribution to the PDF uncertainties of the sideband A F B , but very small fraction of the uncertainties of the A ± (η ℓ ), particularly for |η ℓ | > 1; while the second and the third eigenvectors (labeled as EV02 and EV03) contribute a large or appreciable amount of the uncertainties on A ± (η ℓ ), but a much smaller fraction on A F B . This suggests that when optimizing PDFs using both the A F B and A ± (η ℓ ) samples, these two data sets play complimentary roles in reducing the PDF uncertainties, i.e., the re-diagonalization of the first pair of eigenvector is dominated by the information from the A F B and the second pair has more information from the A ± (η ℓ ). The sensitivities provided by the top two pairs of eigenvector PDFs to the different flavor and x-range, probed by the sideband A F B and the lepton charge asymmetry A ± (η ℓ ) together, are depicted in Fig. 9 and Fig. 10, respectively. It could be verified that these two leading pairs of error PDFs, optimized by using both of those two data sets, resemble the respective first pair of eigenvector PDFs after applying the ePump-optimization procedure to the sideband A F B and the A ± (η ℓ ) alone. This information can be understood from the following physical argument: A F B is dependent on PDFs predominantly because the dilution effect could lead to an incorrect assignment of the z direction of the Collins-Soper definition. At the LHC, the leading order dilution probability that forward and backward is misjudged depends only on the relative size of PDF ratios u/ū and d/d, meaning that it is more sensitive to the quark-antiquark comparison. For A ± (η ℓ ),this asymmetry comes from the difference between the ud cross section and the dū cross section, meaning that it is more sensitive to the flavor difference. As shown in Fig. 9, the first eigenvector pair, which gives the largest PDF contribution to the A F B uncertainty, dominates the u/ū uncertainty in the x region of a Z boson process. The d/d uncertainty is not as dominated by the first eigenvector, because the Z-quark couplings of neutral vector current, which govern the magnitude of A F B at parton level, are proportional to the electric charges of different quark types, so that the sensitivity of A F B to d/d parton distribution is suppressed. Since the observed A F B is a combination of uū and dd processes, it can provide some information on the difference between u and d quark PDFs, but it is not as sensitive to this as it is to the u/ū and d/d ratios. In Fig. 10, the second eigenvector pair, which gives the largest PDF contribution to the A ± (η ℓ ) uncertainty, dominates the d/u andd/ū uncertainties in the x region of the single W boson process. However, it has almost no sensitivity to theū/u v uncertainty in a very large x-range.

B. Optimal choice of bin size
In previous sections, a bin size of 2 GeV on mass is used for measuring the A F B distribution, and a bin size of 0.1 on η ℓ is used for A ± (η ℓ ). In principle, using a large bin size would smear some fine structures of the A F B and A ± (η ℓ ) distributions, and make those observables less sensitive to variations of PDFs. Hence, it is desirable to determine the maximally allowed bin size for a given observable without losing the sensitivity. Due to the difficulty in the experimental unfolding procedure to remove detector effects, such as bin-to-bin migration effects and determination of efficiency and acceptance, it may not be always practical to measure the A F B and A ± (η ℓ ) distributions in such a fine bin configuration. In this section, we discuss how to apply the ePump-optimization procedure to obtain the optimal choice of bin size for A F B and A ± (η ℓ ) distributions. From Sec. VI A, we learned that the PDF-induced error on A F B and A ± (η ℓ ) can be represented by the leading eigenvectors after ePump-optimization. In Fig. 11, we show the A F B distributions, predicted by the first two eigenvector PDF sets, after PDF-rediagonization.For each eigenvector, positive and negative shifted PDF error sets are compared. Similarly, for the A ± (η ℓ ) distribution, comparisons are shown in Fig. 12.
When the PDFs are varied according to the first pair of eigenvector sets, the most significant change in the shape of A F B distribution occurs as an oppositely shifted effect in high mass and low mass regions around the Z-pole, cf. the left-hand plot of Fig. 11. Moreover, the shape of ∆ is almost flat either below or above the Z-pole mass window, where ∆ is the difference between the two values of A F B predicted by the positive and negative shifted error sets. As a result, using a larger bin size on mass will not lose much information on how PDFs affect A F B distribution. On the other hand, as shown in the right-hand plot of Fig. 12, when the PDFs are varied according to the second pair of eigenvector sets, the change of ∆ in the shape of A ± (η ℓ ) distribution is almost a linear-type. Hence, as long as the bin size on lepton rapidity still reflects the linear shape, the sensitivity of A ± (η ℓ ) to PDF variations should not be dramatically reduced.
To quantitatively study the sensitivity loss originated from using a larger bin size, we compare to another analysis done by using a bin size of 5 GeV on mass for the A F B distribution, and a bin size of 0.25 on lepton η for the A ± (η ℓ ) distribution, for the same data samples as used in Section V. We find that numerical calculations by using these wide bins give exactly the same results as that presented in the previous tables, which implies that the reduction of PDF uncertainty would not be compromised by using larger bin size, as proposed above. This leads to a very useful conclusion: aiming for the sin 2 θ ℓ eff measurement, both A F B and A ± (η ℓ ) distributions can be measured in a large bin size to reduce systematic uncertainties without losing much sensitivity in constraining the PDFs. This conclusion should hold for both a quick PDF-updating and a full PDF global fitting. This conclusion is important, because as more data accumulates at the LHC, systematic uncertainties will soon be larger than the statistical uncertainty for many precision measurements. Therefore reducing systematics should have higher priority.   11: AF B distribution predicted by the first and second eigenvector PDF sets, after applying the ePump-optimization method to optimize the CT14HERA2 PDFs for the AF B data, as described in the text. Predictions from the positive and negative shifted error sets of each eigenvector PDF set are compared, and ∆AF B is their difference.

VII. SUMMARY
We have presented a study on how to correctly reduce the PDF-induced uncertainty in the determination of the effective weak mixing angle sin 2 θ ℓ eff , obtained from analyzing the measurement of the Drell-Yan forward-backward asymmetry A F B at the LHC. According to previous studies, the PDF-induced uncertainty can be reduced by the PDF updating procedure using A F B . However, when A F B is used for both PDF updating and sin 2 θ ℓ eff extraction, the correlation between these two important tasks will cause bias on both the updated PDFs and the extracted value of sin 2 θ ℓ eff . Considering the deviation between the previous precise measurements on sin 2 θ ℓ eff , such bias could be at the same level as the PDF-induced uncertainty on sin 2 θ ℓ eff . In this paper we have shown how this bias can be suppressed. A F B is more sensitive to sin 2 θ ℓ eff around the Z pole, while the PDFs affect A F B more significantly in the sideband regions such as e.g. 60 < M ll < 80 GeV and 100 < M ll < 130 GeV. Accordingly, we propose to use the sideband A F B to reduce the correlation between the sin 2 θ ℓ eff extraction and the PDF updating, so that the bias on the sin 2 θ ℓ eff determination can be suppressed, while not significantly losing sensitivity in the PDF updating.
We have applied the ePump program, based on the Hessian updating method, to update the CT14HERA2 PDFs to update the CT14HERA2 PDFs by including the full mass range A F B pseudo data as new input to a global PDF fitting.With this updated-PDF set, we analyzed the extraction of sin 2 θ ℓ eff in the Z-pole mass window and found a sizable bias in its value, with respect to its input value in the pseudo data. Furthermore, the central values of the updated d and u quark PDFs, obtained from this analysis, are different from that of the original CT14HERA2 PDFs. This is caused by the difference in the sin 2 θ ℓ eff values assumed in the pseudo data and the theory templates. To reduce this type of correlation, we proposed to use only the sideband A F B to update the existing PDFs. As expected, using only the sideband A F B data to update the PDFs reduces the bias on the extraction of sin 2 θ ℓ eff value as well as the central values of the updated PDFs. We also show that the asymmetry from W boson decay, A ± (η ℓ ) can be used to further reduce the PDF uncertainty. It plays a complementary role to the sideband A F B data in reducing the PDF-induced uncertainty, with negligible bias on the determination of the weak mixing angle.
A study on the effect of choosing different bin sizes of the A F B and A ± (η ℓ ) distributions was also performed. It showed that using somewhat larger bin size will not sacrifice much of the sensitivity of those two observables in reducing the PDF uncertainty in the sin 2 θ ℓ eff measurement. When more data are accumulated at the LHC, the systematical uncertainties in the A F B and A ± (η ℓ ) measurements will begin to dominate. In that case, there is an advantage in choosing a larger bin size in order to reduce the systematical uncertainties in the experimental unfolding procedures. In this study, using a bin size of 5 GeV on mass for the A F B distribution, and a bin size of 0.25 on lepton η for the A ± (η ℓ ) distribution did not cause a noticeable reduction in the sensitivity of these two data sets to the measurement of sin 2 θ ℓ eff . In conclusion, we have investigated the correlation and potential bias in reducing the PDF-induced uncertainty in the determination of sin 2 θ ℓ eff from the forward and backward asymmetry A F B of the Drell-Yan processes at the LHC. Derived from quantitative computation of the Hessian-based ePump PDF updating program, it can be concluded that by excluding Z pole region events in the PDF updating, the potential bias on the sin 2 θ ℓ eff extraction would not significantly enlarge the estimated total uncertainty, including the statistical and PDF-induced uncertainties at the LHC Run 2. However, the bias is not negligible and thus still needs careful evaluation in the future precise sin 2 θ ℓ eff measurements at the high luminosity LHC. Moreover, although it is useful to quickly use ePump to estimate the impact of a new data set on the PDFs, we suggest to use the PDF updating method as only a preliminary way to reduce the PDF-induced uncertainty in the sin 2 θ ℓ eff measurements. A full PDF global fitting analysis is necessary for a complete determination of sin 2 θ ℓ eff with PDF correlations, in which new degrees of freedom in the non-perturbative parametrization of the PDFs can be explored. Furthermore, all experimental results, i.e., A F B and A ± (η ℓ ) studied in this article, should be provided in a format such that theorists could replace the preliminary PDF updating method employed in the experimental analysis by a consistent global analysis.