A CT18 global PDF fit at the leading order in QCD

In this paper, we present a CT18 PDFs fitted with Leading-Order QCD perturbation theory. The CT18 LO PDFs is obtained within the general CT18 framework, along with two additional treatments being imposed to improve the quality of the fit. We take the $W$-boson charge asymmetry and inclusive single-top production at LHC as examples to illustrate the implication of the CT18 LO PDFs.


Introduction
Parton distribution functions (PDFs) describes the structure of hadrons as composed of (anti)quarks and gluons. PDFs is needed to make predictions for hard scattering processes in high-energy collisions. Currently, with measurements at the Large Hadron Collider (LHC) becoming unprecedentedly precise, PDFs must be known at a high level of accuracy and precision. Such precise PDF parametrizations are provided by several groups [1][2][3] by taking advantage of the availability of predictions at next-to-next-to-leading order (NNLO) in QCD coupling α s for a large number of collider processes. Meanwhile, predictive power of leading order (LO) QCD theory is no longer sufficient for today's precise measurements. However, the need for LO PDFs still exists. Commonly used event generators, such as PYTHIA, still rely on simulations of parton showers using LO splitting kernels [4], though progress has been made in literature to implement parton showers at the next-to leading order (NLO) [5].
The latest LO PDFs in the high-precision LHC era are MSHT20 LO [2] and NNPDF4.0 LO [3]. To improve the description of Drell-Yan (DY) processes, a K-factor of 1 + α s C F π/2 has been adopted in LO PDFs fits of MSHT family [2,6,7]. The fit quality of MSHT20 LO is χ 2 /N pt = 2.58, which is worse than their previous LO results and the MSHT20 PDFs beyond LO. The NNPDF4.0 LO is not able to fit experimental data well either, with χ 2 /N pt = 3. 35. Such bad qualities of fits for these LO results come from the inclusion of high precise LHC

Description of the CT18 LO global fit
In this section, we firstly introduce the experimental data sets as input for CT18 LO global fit. Then we impose two additional treatments to improve the results of fit from the limitation of LO perturbation theory.

CT18 data sets
The CT18 QCD global analysis [1] is obtained by fitting from a wide range of LHC data sets with high precision, the combined HERA I+II DIS data sets, and data sets already included in CT14 global QCD analysis [9], totally 40 data sets. The global fits are performed with NLO and NNLO QCD perturbation theories. Both NLO and NNLO fits can describe this large data set well in the measurement of χ 2 /N pt as shown in Tables 1 and 2. In the following section 2.2, we will show that a subset of CT18 data set, particularly consisting of high-precision LHC data sets in Run-II era, is difficult to fit with LO QCD theory.
For the high-precision LHC data sets included in the CT18 analyses, there are six data sets corresponding to W and Z vector boson production. For the ATLAS measurements, they are the √ s = 7 TeV W and Z combined cross-section measurement with 4.6 fb −1 of integrated luminosity (ID=248) [11], and √ s = 8 TeV transverse momentum p T of lepton pairs distribution in the Z/γ * production with 20.3 fb −1 of integrated luminosity (ID=253) [12]. For the CMS measurement, the √ s = 8 TeV muon charge asymmetry A ch for inclusive W ± production with 18.8 fb −1 of integrated luminosity (ID=249) [13] is included. For the LHCb measurements, the data sets included in CT18 are √ s = 7 TeV W and Z forward rapidity cross-section distribution measurement with 1.0 fb −1 of integrated luminosity (ID=245) [14], √ s = 8 TeV Z → e + e − forward rapidity cross-section distribution measurement with 2.0 fb −1 of integrated luminosity (ID=246) [15], and √ s = 8 TeV W and Z production cross-section distribution measurement with 2.0 fb −1 of integrated luminosity (ID=250) [16], respectively. The ATLAS √ s = 7 TeV W and Z combined cross-section measurement (ID=248) data set [11] is not included in the nominal CT18, since this data set is observed to have tension with other data sets (Sec. II.C of [1]). Alternative PDFs sets, CT18A and CT18Z, have been generated with the inclusion of ATLAS √ s = 7 TeV W and Z data set. In the analysis of CT18 LO, we start from the CT18 data sets, without the inclusion of the mentioned ATLAS √ s = 7 TeV W and Z data set. Then, as will be shown in the next section, we shall exclude five data sets which cannot be correctly described at LO.

Special treatments adapted in the CT18 LO fit
In comparison to the PDF fits at NLO and NNLO, a naive fit at LO tends to be problematic. Theory predictions at LO are less complicated than at higher orders, but they miss some contributions of quantum corrections that are especially vital for describing the most precise experimental data. For example, many NLO predictions of LHC cross-sections tend to be larger than predictions at LO in terms of magnitudes, see Figure 1 of Ref. [8] for comparison between LO and NLO predictions of SM boson rapidity distributions at LHC. On the other hand, shapes and magnitudes of PDFs are restricted by the momentum sum rule and the flavour number sum rules. Consequently, PDFs determined at LO is known to have incorrect behaviour over wide range of x. So predictions of spectrum with LO PDFs and LO matrix elements are unreliable also in terms of shape, for example see Figure 1 of Ref. [8] for comparison between LO and NLO PDFs with LO matrix elements. To resolve the difficultly in the determination of LO PDFs and the generation of LO predictions with LO PDFs, the conventional approach of PDFs determination is in need for extension.
In CT18 global analyses, several vector boson production data sets from LHC Run-II are included. The experimental uncertainty in these data sets is at a percent level and can strongly constrain PDFs at the electroweak scale. A LO fit with the inclusion of these data sets is difficult, and the best-fit PDFs fails to describe fitted data sets well. To illustrate this point, a PDF set named CT18 LOpert has been generated, where all theoretical predictions are computed at LO, and no other adjustments have been applied. In Tables 1 and 2, the CT18 LOpert presents undesirable qualities of fit both totally and individually to specific data sets. In order to improve the fit at LO, in our final result, CT18 LO, we have applied the following two special treatments: • From the CT18 data set, we exclude ID 169 H1 F L [10], ID 145 H1 bottom reduced cross-section [17], ID 147 combined HERA charm production [18], ID 253 ATLAS 8 TeV Z boson p ll T distribution [12], and ID 268 ATLAS 7 TeV W and Z bosons rapidity distribution plus W charge asymmetry distribution [19], since these data sets cannot be well described by the QCD theory at leading order. Furthermore, the ID248 ATLAS 7 TeV precision W and Z data [11] were not included in the nominal CT18 NLO and NNLO analysis, due to their tension with other datasets in the global fit [1]. For comparison, alternative CT18A and CT18Z NLO and NNLO PDFs sets were generated [1] with the inclusion of this data set. After excluding all the above-mentioned data sets, the total number of remaining data points is 3547, which is 134 points less than the number of CT18 data set.
• Fot the rest of Drell-Yan data sets, for the inclusive production of either W ± or Z bosons, we adopt a K-factor K(Q) as in Eq. (1) to partially make up the limitation of LO matrix elements,  Table 1: The qualities of fits for selected data sets, whose χ 2 /N pt of CT18 LOpert are larger than 6.0. The values with an asterisk are χ 2 /N pt of the data sets which are not included in the corresponding fit.
To elaborate on the first treatment, we first note that the longitudinal structure function F L = F 2 − 2xF 1 at leading order O(α 0 s ) respects the Callan-Gross relation [36], Beyond LO, the gluon emission would give arise of a non-vanishing F L , and the Callan-Gross relation is violated accordingly. In CT18 data set, the ID 169 H1 F L data [20] measures the longitudinal structure function by e ± p collision. With the reason just said, a LO PDF fit can never be able to describe this data. Hence we exclude this data set from the LO fit.   Table 1, qualities of fits χ 2 /N pt for DIS and jet data sets in the CT18 data set are compared. Totally, the number of data points in CT18 LO is 3547, which is reduced by the first special treatment from the the number of data points in CT18 LOpert, NLO and NNLO, 3861.
The ID 145 H1 bottom reduced cross-section [17] and the ID 147 Combined HERA charm production [18] data measure the inclusive bottom and charm production rates, respectively, from deep inelastic ep scattering, which are sensitive to higher order QCD corrections due to the non-vanishing mass of heavy partons of the proton. Detailed discussions can be found in Refs. [37][38][39][40][41]. Therefore, we also exclude these two data sets. Moreover, a LO QCD calculation, at O(α s ), cannot describe well the ID 253 ATLAS 8 TeV Z boson p ll T distribution [12], because of the presence of large logarithm ln(M Z /p ll T ). Likewise, a QCD calculation, at O(α s ), cannot describe well the inclusive W or Z productions, but with asymmetric kinematic cuts applied to the two decay leptons of the vector boson, such as the ID 268 ATLAS 7 TeV W charge lepton rapidity asymmetry measurement [19]. Needless to say that at O(α 0 s ), the p T distribution of the Drell-Yan pair produced at the LHC is a delta function with peak at zero, and the two decay leptons must have the same transverse momenta, as predicted by the parton model with longitudinal PDFs. Hence, these data sets are also excluded in our LO fits.
The importance of the second treatment will be discussed in Sec. 3.3. In LO PDFs studies of the MSHT family [2,6,7], this K-factor, Eq. (1), was also adopted for helping describing vector boson production data. Apart from these two special treatments, there is no other treatment been applied to CT18 LO, as we wish to keep the balance between a LO PDFs fit and a good fit quality. As for the strong coupling α s as input, it is expected that in a LO PDFs analysis the best-fit would prefer a larger value of α s . We decide to fix the strong coupling at Z-boson mass scale to be α s (m Z ) = 0.135 for CT18 LO. The α s dependence of CT18 LO fit will be discussed in Sec. 3.4.

Results
In this section, we present results of the CT18 LO PDFs fit, which is obtained based on the CT18 framework but with two additional treatments as defined in Sec. 2.2. Along with the fit quality, the presentation of PDF configuration and various PDF moments, the impact of Drell-Yan data from LHC precision measurements and α s dependence of the fit will also be discussed.

Quality of the fit
Goodness-of-fit figures, χ 2 /N pt , for selected data sets are summarised in Tables 1, 2. The overall χ 2 /N pt of CT18 LO is 1.60, significantly improved from 2.15, the total χ 2 /N pt of CT18 LOpert fit. In total, the fit to the CT18 LO data set is clearly enhanced by two special treatments introduced in Sec. 2.2, though it is still much worse than the CT18 NLO and NNLO fits.
For individual data sets, the majority of them receives a smaller χ 2 /N pt in CT18 LO, comparing to in CT18 LOpert. With the help of the second special treatment, the Drell-Yan K-factor, Eq. (1), the high-precision LHC W and Z bosons production data sets in Table 1 obtain a better χ 2 /N pt in CT18 LO than in CT18 LOpert. But the fits to them by no means are good. For DIS and jet data sets shown in Table 2, fits to these data sets are mostly improved from in CT18 LOpert, but again it is difficult to obtain a good fit with χ 2 /N pt ∼ O(1) at this order. In CT18 LO the ID160 HERA I+II combined reduced crosssection [10] has a slightly larger χ 2 /N pt than in CT18 LOpert. This increase only comes from several data points with very low energy scale Q = 2.121 GeV in the neutral current channel of e + p collision, where the correlated systematic errors pull the central values of data points away from theory prediction. In the CTEQ-TEA program, the best-fit value of χ 2 is the combination of the best-fit χ 2 to the shifted data and the contribution from the optimal nuisance parameters [1,42]. The optimal nuisance parameters for these low energy neutral current HERA data points thus have very large values, suggesting that there is a noticeable systematical bias in the CT18 LO fit to these data points. The fits to the rest of HERA data points for CT18 LO and CT18 LOpert are in general comparable. A similar phenomenon, that one data point being pulled far away by correlated systematic errors results in an increase in χ 2 /N pt from CT18 LOpert to CT18 LO, also happens to ID504 CDF Run-2 inclusive jet production [29].
The equivalent information of the agreement to experimental data can be provided by the effective Gaussian variables S = 2χ 2 − 2N pt − 1 [43], whose distribution theoretically approaches N (0, 1) for large N pt . In Table 3, we summarize the effective Gaussian variables of CT18 LO and CT18 LOpert for selected data sets in Tables 1, 2. We notice that there is a plenty of data sets in CT18 LO and CT18 LOpert having |S| > 1, so that totally the distribution of the effective Gaussian variable in CT18 LO deviates significantly from N (0, 1), which is expected in a good fit. The distribution of the effective Gaussian variables S, along with the distribution of χ 2 /N pt , indicates that the experiments are strongly underfitted in CT18 LO, although two special treatments introduced in Sec. 2.2 can help improving the quality of the fit. In comparison to CT18 NLO, the CT18 LO exhibits a different configuration due to the significant difference between the LO and NLO QCD perturbation theory. In Figs. 1(a) and 1(b) for u(x) and d(x), the CT18 LO in the small-x and large-x limits is consistent with CT18 NLO. As shown in Figs. 1(c) and 1(d), the bumps in valence quarks around 0.1 < x < 0.3 are reduced so that CT18 LO is outside of the CT18 NLO error band. Meanwhile, the u v and d v distributions are enhanced in the region x < 0.01 due to the valence number sum rules. The CT18 LO gluon PDF exhibits deviations from CT18 NLO in the ranges 3 × 10 −3 < x < 0.1 and x > 0.2, cf. Fig. 1(e). Enhancement of the gluon PDF in the large-x region is needed at LO in order to compensate for the missing higher order contribution to the Wilson coefficients of a number of scattering processes, such as high p T jet production at Tevatron and LHC and the precision DIS data at HERA, as required by a consistent NLO (or NNLO) theory calculation for describing the existing data. Fig. 1(f) shows that the CT18 LO strange PDF has a good agreement with CT18 NLO for x > 0.02. In the range x < 0.02, the CT18 LO strange PDF shows a larger magnitude than the CT18 NLO.

Functional dependence and moments of PDFs
The CT18 LO PDFs at 100 GeV are shown in Fig. 2 Tables 1, 2. are still different from the CT18 NLO distributions in the way similar to Fig. 1 at 1.3 GeV. The CT18 LO gluon PDF at 100 GeV is higher than CT18 NLO for x < 3 × 10 −3 . In Fig.  2(f), the CT18 LO s quark PDF at 100 GeV is consistent with CT18 NLO.
In Table 4, we summarize the second moments x at the initial Q 0 scale, which quantifies the momentum carried by an individual flavour parton. Comparing to CT18 NLO central values, there are significant increments in the strange and gluon second moments. As mentioned before, more hard gluons are particularly required at the LO in order to describe the precision data, by increasing the parton densities of sea (anti)quarks and gluons in the smaller-x region via the LO DGLAP evolution. Similarly, the parametrised strange PDF at Q 0 scale is also driven to acquire more momentum in order to describe data which are sensitive to s-quark PDF, such as DIS di-muon data and precision W and Z data. Due to the momentum sum rule, all parton densities are correlated. Hence, because of the enhancements in x s and x g , all the other flavours are allocated with less momenta than in CT18 NLO. The CT18 LO second moments at Q 0 scale in general are consistent with CT18 NLO within one standard deviation, except for the strange PDF. Without higher-order corrections, a better determination of strange PDF is difficult, as to be seen below.

PDFs
x

Impact of precise LHC W and Z production data
Today, vector boson production at the LHC can be measured so precisely that total experimental uncertainty is at a percent level. At NLO, the production cross-sections of the Drell-Yan processes receive large corrections from contributions of additional quark-gluon subprocesses and the virtual correction on the vertex of quark associated with vector boson, so that the Born-level cross-section is not capable of describing experimental data as accurately as those beyond the leading order. The fourth column of Table 1 shows the values of χ 2 /N pt of the CT18 LOpert fit, which uses the LO theory prediction without including the K-factor, as introduced in Eq. (1), for computing Drell-Yan cross-sections. It is evident that the CT18 LOpert fit cannot describe the data well, with a very large value of χ 2 /N pt for each individual Drell-Yan data. Furthermore, the resulting PDFs are also problematic, especially the strange quark PDF.
To illustrate the impact of these gauge boson production data sets, we compare a series of fits, starting from CT18 LOpert introduced in Sec. 2.2, with various weights to LHC Run-II vector boson production data sets, namely, ID 245, 246, 249, 250, and 253, as shown in   3. For weights of these data sets being larger or equal to 0.6, the strange PDF vanishes in the range 2 × 10 −4 < x < 0.02 at Q 0 = 1.3 GeV under the impact of these data sets. When evolved to 100 GeV, the strange distribution is still quite small in this range of x. If the impact of these data sets is gradually removed from fits as weights becoming smaller, the resulting s-quark PDF will become larger. When fitting the vector boson production, the up and down sea quark distributions is driven by data to increase their magnitudes to compensate for the deficiency in the LO Wilson coefficients. Since all flavours are correlated under the total momentum sum rule, the magnitude of the strange PDF has to be reduced when that of the others is increased. Such a strong suppression of the strange PDF is well resolved in CT18 LO by applying the Drell-Yan K-factor, Eq. (1) to those Drell-Yan data sets, so thatū andd PDFs are suppressed and s-quark PDF is increased as compared to those in CT18 LOpert.
As an example, in Fig. 4, we compare predictions for the ID 245 LHCb W and Z bosons production at 7 TeV [14] by CT18 LO, CT18 LOpert, and CT18 NLO to the experimental data points. The theory predictions are calculated by using the APPLgird package [44]. Without the application of K-factor, Eq. (1), CT18 LOpert cannot provide enough crosssections for either W or Z production, and yields a χ 2 /N pt as large as 8.36, as shown in Table  1. Such a difficulty in fitting vector bosons production data results in the vanishing feature of the CT18 LOpert s-quark PDF at lower energy scale as shown in Fig. 3. In CT18 LO, the prediction of overall magnitude of this process, ID 245, is improved by the Drell-Yan K-factor and consistent with predictions by CT18 NLO. Hence, the quality of fit to this data is improved to χ 2 /N pt = 5.85. Consequently, the suppression on the s-quark PDF as in CT18 LOpert is accordingly relaxed. But the improvement of the shape of rapidity spectrum still requires higher-order QCD corrections.  LOpert series, and CT18 NLO. In CT18 LOpert series, The suffix "lw" means "low weight", and the following numbers refer to the weights to ID 245, 246, 249, 250, 253. In the PDFs named "CT18 LOpert", i.e., the brown curves in both panels, the weights to these data sets are unity, as in CT18 LO and CT18 NLO.

α s dependence of CT18 LO
The strong coupling constant is one of key elements in computing theory predictions, and hence fed as input into a PDFs global fit. For the PDFs fit beyond LO, it is widely accepted [1][2][3]6,7,9,45] that the value of strong coupling at Z-boson mass scale is fixed at its PDG value α s (m Z ) = 0.118 [46]. Due to the missing of important quantum corrections, to generate more sea quarks via parton evolution to resolve the difficulty in a LO PDF fit, the value of α s (m Z ) as input is often fixed at a higher value than the PDG global average α s (m Z ) = 0.118. A number of LO PDFs [2,9,45] takes α s (m Z ) to be at 0.130. For NNPDF4.0 [3], NNPDF3.1 [45], and CT14 [9], LO PDFs with α s (m Z ) = 0.118 are also provided. For MSTW08 LO [6] and MMHT14 LO [7] α s (m Z ) is fixed at 0.140 and 0.135 respectively.

Phenomenology
In this section, we present the implication of the CT18 LO PDFs by comparing some LHC phenomenologies generated with CT18 LO and NLO PDFs. Specifically, for the differential distributions, we consider the experimental measurement of the charge asymmetry A ch in W -boson production at 8 TeV with the CMS detector. For the inclusive total cross-section, the prediction of the single-top production at 14 TeV is calculated. In these computations, the input physical parameters are set as follows: m W = 80.385 GeV, G F = 1.16639 × 10 −5 GeV −2 , Γ W = 2.06 GeV. (3)

Charge asymmetry A ch in W -boson production
The difference between W + and W − production cross-sections via the Drell-Yan process is dominated by PDFs of incoming quarks. Hence, the rapidity asymmetry of the charged lepton from W boson decay serves as a good observable to probe the ratio of parton luminosities. In Fig. 6, the comparison between predictions for ID 249 CMS muon charge asymmetry A ch at 8 TeV [13] and experimental data points are shown, as a function of the pseudo-rapidity of muon from W boson decay. The calculation of pp → W ± + X → µ ± ν + X differential cross-sections rapidity distributions are performed with APPLgrid [44].
As shown in the left panel of Fig. 6, predictions by CT18 LO exhibits a different shape from the experimental data. For lower rapidity region, the CT18 LO predictions are about ∼5% higher than the measurements. For larger rapidity, the CT18 LO predictions for A ch tends to be more consistent with measurements. While the CT18 LO predictions for muon charge asymmetry generally lies within the CT18 NLO uncertainty band over whole rapidity range, the CT18 LOpert prediction shows a worse agreement. This phenomenon indicates that the CT18 LO and NLO are consistent in the ratio of down and up antiquark PDFs, which can be directly observed in the right panel of Fig. 6, where the CT18 LO is mostly inside of CT18 NLO error band for the ratio ofd/ū, except for the range 0.01 < x < 0.03 where the CT18 LOd/ū is outside of CT18 NLO error band.

Single-top production
We select the calculation of total cross-section for the t-channel inclusive single-top production as a representative process to study the implication of CT18 LO PDFs.
This process plays an important role in constraining the heavy quark PDF, and it has been measured at LHC [47][48][49][50][51][52][53][54][55] at various center-of-mass energies. The theoretical calculation of this process could serve as a test on the consistency in PDFs at different perturbation orders [56,57], since the total inclusive cross-sections consistently predicted at different orders are all expected to reproduce the data. We make use of this property to illustrate the consistency of CT18 LO with CT18 NLO. In our calculation, the t-channel inclusive single-top production cross-section is computed by MCFM [58][59][60]. For this calculation, the input parameters take the values as shown in Eq. (3), along with the renormalization and factorization scales chosen as µ R = µ F = m t , while the top mass m t is chosen to be consistent with the corresponding PDF sets.
Predictions for the t-channel inclusive single-top production with a variety of PDFs are presented in Table 5. In general, due to the lack of higher-order corrections, the LO predictions for this process tend to be smaller than their corresponding NLO predictions. The CT18 LO prediction of single top quark production is slightly outside of the CT18 NLO uncertainty band, while for the single anti-top quark production the CT18 LO and CT18 NLO are well consistent. Comparing to CT14 LO predictions, the CT18 LO predictions to both top and anti-top production are enhanced substantially, and better consistent with its corresponding NLO fit.

Conclusion
In this paper, we present CT18 LO PDFs, which is obtained within the general framework of CT18 global analysis with extensions of two special treatments, as defined in Sec. 2.2. One is to discard some data sets, which cannot be properly described at LO (such as Drell-Yan data with different cuts on the transverse momemta of the two final state leptons), from the CT18 data set. The other is to apply a K-factor to predictions for Drell-Yan processes for making up the insufficiency of LO predictions. As the result, the quality of the LO fit, cf. CT18 LO, is substantially improved from that of a naive LO fit, cf. CT18 LOpert. In CT18 LOpert, strange quark distributions are strongly impacted by the high-precision W and Z production data from LHC Run-II era, as shown in Fig. 3, since LO Wilson coefficients cannot provide enough normalization to predictions to these precise measurements. The strong suppression on s(x) is relaxed in CT18 LO via the implementation of the Drell-Yan K-factor, which supplies additional normalization to Drell-Yan processes. We have checked that CT18 LO PDFs is capable in generating numerical predictions close to CT18 NLO PDFs for the rapidity distribution of charge asymmetry in W -boson production and the total cross-section of t-channel inclusive single-top production. But still we should stress that the CT18 LO PDFs is different from CT18 NLO PDFs on many aspects, including the quality of fit, PDFs configurations, and the ability of describing experimental data. Therefore we would not suggest to use this result, CT18 LO, in analyses where precision is the dominant requirement. Since the LO PDF fits embed a huge theoretical uncertainty, we do not provide an error set for the CT18 LO PDFs. The central CT18 LO PDFs in LHAPDF format [62] is publicly available: https://lhapdf.hepforge.org