Uncertainty of parton distribution functions due to physical observables in a global analysis

The recent measurement of the differential γ + c-jet cross section, performed at the Tevatron collider in Run II by the D0 collaboration, is studied in a next-to-leading order (NLO) global QCD analysis to assess its impact on the proton parton distribution functions (PDFs). We show that these data lead to a significant change in the gluon and charm quark distributions. We demonstrate also that there is an inconsistency between the new high precision HERA I+II combined data and Tevatron measurement. Moreover, in this study we investigate the impact of older EMC measurements of charm structure function on the PDFs and compare the results with those from the analysis of Tevatron data. We show that both of them have the same impact on the PDFs, and thus can be recognized as the same evidence for the inefficiency of perturbative QCD in dealing with charm production in some kinematic regions.


Introduction
An accurate knowledge of the parton distribution functions (PDFs) is essential for the Standard Model (SM) and New Physics hard scattering processes such as those performed at the Tevatron and LHC. In recent years, several global Quantum Chromodynamics (QCD) analyses of PDFs have been performed [1][2][3][4][5][6][7][8][9], and considerable progress has been made to improve our knowledge of the PDFs. However, many problems remain open, because there are diverse sources of experimental and theoretical uncertainties in the conventional approach to extract specific PDF sets. In this vein, including new data measured at hadron colliders in global analyses is a good way to put further constraints on PDFs.
One of the important issues in any global analysis of PDFs is the determination of the gluon and sea quark distributions. Although deep inelastic scattering (DIS) and fixed-target data can be use to put overall constraints on PDFs, the inclusion of the Tevatron and LHC measurements in the analysis for more accurate determination of them is necessary. In hadron-hadron collisions, the high-energy photons are mainly produced directly in a hard parton scattering process. For this reason, and due to their pointlike electromagnetic coupling to the quarks, they provide a clean probe of parton-level dynamics [10][11][12]. Measurements of the γ + h-quark jet differential cross section, where h is a charm or bottom quark, as a function of photon transverse momentum p γ T [13,14] can improve our understanding of the underlying production mechanism and also provide useful input for the gluon and heavy quark PDF of the colliding hadrons [15][16][17]. The reason is that the biggest contribution to the cross section comes from the Compton subprocess gQ → γQ, though at the Tevatron the annihilation subprocess qq → γQQ also becomes important, especially at large p γ T . In particular, one of the very sensitive measurements to the gluon and charm quark PDFs is differential γ+c-jet production data from the D0 experiment at the Tevatron [18]. These data can be recognized as a serious challenge for perturbative QCD since they overshoot the standard next-to-leading order (NLO) QCD predictions at large transverse photon momenta. In Ref. [19], it was shown that the inclusion of an intrinsic charm (IC) quark component in the nucleon can decrease the difference between data and theory, but not fully resolve it. Actually, a part of this discrepancy might be due to the lack of higher-order corrections to the qq channel, which is not included in [19]. Note also that in Ref. [19] the D0 data had not been included in the global analysis.
In DIS, the quantity that is very sensitive to the gluon and charm quark PDFs is the charm structure function F 2 c . In addition to the recent HERA measurements of this quantity [20], there are also older EMC Collaboration measurements [21,22] that include a broader range of the Bjorken scaling variable x. Actually, these data are the only existing measurements of the charm structure function at large x and can be recognized as evidence for an intrinsic component in the proton. In Ref. [4], it has been indicated that the theoretical description of the EMC F c 2 data is poor, both at large and low x values. Moreover, it was shown that considering the IC contribution can improve the results at large x. However, it should be noted that in that work the EMC data had not been included in the global analysis. Recently, some new global analyses by considering these data have been presented [1,23], and all of them found that, in a conventional approach of QCD analysis, the fit of EMC data is far from satisfactory.
The main purpose of this paper is to investigate the impact of the differential γ+c-jet cross section data from D0 on the PDFs by performing a NLO global analysis. Moreover, we are going to investigate the impact of older EMC measurements of charm structure function F 2 c on the PDFs and compare the results with those gained from the analysis of Tevatron data. Actually, since both D0 and EMC data show some evidence for the existence of non-perturbative charm in the proton, we predict that these data should have the same impact on the PDFs. In this regard, we perform some global analyses as follows. First, we fit to a fairly standard selection of experimental data sets from various observables and experiments to gain PDFs that are well constrained. This is our base fit. Then, the D0 and EMC data are separately included in this global analysis to study their impact on the PDF behaviour and answer the question of whether they have the same effect on the final results. In comparison with a previous analysis performed by including the EMC data in the fit process [23], we include a variety of the LHC and Tevatron data and also new high precision HERA I+II combined data [24] in our analysis. Furthermore, we use more restrictive kinematic cuts which are more traditional in QCD global analysis of PDFs.
The rest of this paper is organized as follows. In Sections 2 and 3, we briefly review the physics of the prompt photon production in association with a c-jet in hadronic collisions and charm production in DIS, respectively, that are needed in the theoretical description of D0 and EMC data. Then, in Section 4, we introduce our QCD analysis framework including the experimental data sets that are used, kinematic cuts, PDF parametrizations, the χ 2 definition, etc. In Section 5, we present the obtained results from various global fits and assess the impact of D0 and EMC data on the PDFs, and also compare their results. Finally, in Section 6, we summarize our results and present conclusions.

2
Prompt photon production in association with a c-jet In recent years, prompt photon production in association with a heavy quark jet in hadronic collisions has been investigated both theoretically and experimentally in many studies [13,18,25,26]. For example, the differential cross section for the associated production of a c-quark jet and an isolated photon in pp collisions at center-of-mass energy √ s = 1.96 TeV as a function of photon transverse momentum p γ T has measured in Ref. [18]. They have measured the γ + c-jet cross section for photons with rapidity |y γ | < 1.0 and transverse momentum 30 < p γ T < 300 GeV so that the c-jet has |η c | < 1.5 and p c T > 15 GeV. This process can provide some information on parton distributions of the nucleons and is also a powerful tool for testing the possibility of the existence of intrinsic charm quarks in the nucleon.
At leading order (LO), the Compton subprocess gc → γc gives the main contribution to the pp → γc (see Fig. 1) that is dominant at low p γ T . In addition, the annihilation subprocess qq → γcc becomes important at large p γ T [27]. Note that at LO, the inclusive γ+c production can arise from the subprocesses gg → cc, cg → cg or qc → qc where the fragmentation of a c-quark produces a photon. At the LHC, the annihilation process qq → γcc does not act an important role and the Compton process dominates for all energies. The subprocesses within the NLO are more complicated than at LO. They include processes like gg → γcc and gc → γqc [28] where the photon is effectually created via radiation off an intermediate quark line. All subprocesses at NLO, apart from the annihilation subprocess, are g and c PDF initiated. Thus, the cross section at the NLO is more dependent on the gluon and charm PDFs [28]. In the following, we investigate the impact of the differential γ + c-jet cross section data from D0 on the PDFs by performing a NLO global analysis.

The emc data
The determination of gluon and sea quark distributions in the nucleon with lower uncertainties has always been a challenge in global analyses of PDFs. One can use, in the first step, the deep inelastic structure functions F 2,L (x, Q 2 ) data (or reduced cross sections) to put an overall constraint on the gluon and sea quark densities. Then, the various measurements from Drell-Yan experiments and hadron collisions can be considered to reduce further their uncertainties. In this respect, if we are looking for more information about the charm quark PDF, we can invoke the charm structure function F c 2 (x, Q 2 ) data since it is sensitive to the charm distribution. This can be complemented by measurements of the prompt photon production in association with charm quark at colliders. Although the HERA measurements of F c 2 (x, Q 2 ) [20] cover only the x < 0.1 region, the EMC Collaboration [21,22] measurements cover a broader range of x including both small and large x regions. The difficulty of fitting these EMC data has been highlighted previously in Refs. [1,23]. These data are extremely old. EMC had some issues in the measurement of the total structure function, which was eventually understood and led to the EMC inclusive structure function not being included in PDF analyses. The measurement of the charm structure function may suffer from some of the same issues as the inclusive data. Furthermore, a tension between these data and HERA [24] has been observed in our analysis, the same as the reported results in Ref. [23]. Consequently, the data are very rarely used in PDF global analyses. On the other hand, the EMC data are recognized as an evidence for the existence of the intrinsic charm in the nucleon because they are not in complete agreement with the predictions of the standard QCD calculations at large x. To be more precise, the theoretical predictions underestimate the EMC data at x 0.2 and overestimate them at smaller values of x.
In Ref. [4], it is demonstrated that the theoretical description of the EMC F c 2 data is poor, both at large and low x values. In this way, using the EMC data in a global analysis of PDFs leads to an increase in the total χ 2 value. In addition, it can strongly affect the behaviour of PDFs. These facts can be considered as main reasons to reject the EMC data from performed global analyses of PDFs [2,7,24,[29][30][31][32][33]. In this work, by performing some global analyses, we are looking to study in details the impact of the EMC data on the behaviour of the parton distributions in different kinematic regions.

QCD global analysis
In this section, we present a brief overview of the standard theoretical formalism and experimental data used for performing the QCD global analyses. In the present study, the NLO QCD analysis and PDF extraction is performed using the HERAFitter framework [34][35][36]. The experimental data used cover a satisfactory range of DIS and collider data. Actually, we use the DIS data from HERA [24], BCDMS [37], CCFR [38], SLAC [39] and NMC [40] to constrain the PDF behaviour. In order to put further constraint on the PDFs and reduce their uncertainties, we also used the inclusive jets of H1 [41] and ZEUS [42,43], and several LHC and Tevatron data such as W and Z production by DØ [44][45][46], CDF [47,48], ATLAS [49][50][51] and CMS [52][53][54] Collaborations. All experimental data sets used are listed in the first column of Table 1. We include then the EMC charm data [21] and photon production in association with a charm quark by DØ [18] at different stages of our research to study the impact of these data on the extracted PDFs. Note that the DIS data from HERA, BCDMS and SLAC constrain the gluon and sea quark distributions well in the whole range of x. On the other hand, the valence quark distributions and the difference betweenū andd distributions can be controlled directly from the CCFR xF 3 and NMC F d 2 /F p 2 data, respectively. Moreover, the HERA charm data [20] can give us some valuable information about the gluon distribution in small x regions. Note that we applied the kinematics cuts Q 2 > 4 GeV 2 and W 2 > 15 GeV 2 on the data (only for xF 3 data, we considered W 2 > 25 GeV 2 ) to avoid the non-perturbative and higher-twist effects.
Since we are not looking here for a comprehensive global analysis of PDFs and our main goal is the study of the impact of EMC and DØ data on the PDFs, it is sufficient to use simple flexible parametrization forms for the input parton densities, such as the HERAPDF form [55]: However, it should be noted that the optimal parametrization forms for the PDF fit can be found through a parametrization scan as described in Ref. [56].
In this way, we used the following parametrization for the valence quarks xu v (x) and xd v (x), the anti-quarks xŪ (x) and xD(x) where xŪ (x) = xū(x) and xD(x) = xd(x) + xs(x), and finally the gluon xg(x) at the input scale Q 2 0 = 1 GeV 2 , Although, at first glance, the number of unknown parameters in Eqs. (2) is high, we can reduce it with the help of sum rules and also by considering some simplifying assumptions. To be more precise, we can fix three parameters by using the momentum and valence quark number sum rules. We choose A uv , A dv and A g as usual. For the parameter B in the valence and sea quarks PDFs, we consider the constraints B uv = B dv and Bū = BD. The contribution of the strange quark density is taken to be proportional to xs = f s xD. It has been shown that a value of 0.31 is a good estimation for the factor f s [57]. Furthermore, we consider an additional constraint Aū = AD(1 − f s ) that guarantees the same behaviour of the xū and xd as x → 0. After these simplifying assumptions, there are 14 unknown parameters which should be determined by the fit. It should also be noted that in the present work, the QCD coupling constant is taken to be equal to α s (M 2 Z ) = 0.118 at the Z boson mass scale.
We now discuss the nuclear corrections of the PDFs required for analysing the EMC structure function data. In fact, since these data have been obtained by scattering on a heavy nuclear target, considering the nuclear corrections of the PDFs is inevitable and can also improve the goodness of the fit. We know that the parton distributions of a proton bound in a nucleus of mass number A can be related to corresponding distributions of a free proton according to following relation where R f is the nuclear correction for parton flavor f . In this work, we use the nuclear corrections obtained by de Florian and Sassot [58] through a NLO global analysis of nuclear data. In addition to the nuclear corrections, we need a factor to consider the mismatch between the partonic and hadronic charm thresholds. In this respect, we apply the weight factor on heavy structure function as suggested in Ref. [59]. In order to properly include the effects of heavy quark masses, we utilize a general-mass variable flavour number (GM-VFN) scheme called the Thorne-Roberts (TR) scheme [60] in the calculation of the structure functions. For performing theoretical predictions corresponding to the measurements of hadron colliders we use aMCfast [61], APPLGRID [62], MadGraph5 aMC@NLO [63] and MCFM [64]. It should be noted that in our analysis, the renormalisation and factorisation scales are set to Q 2 . Moreover, the charm and bottom quark masses are taken to be equal to m c = 1.4 GeV and m b = 4.75 GeV, respectively.
In any global analysis of PDFs, the unknown parameters are determined by minimisation of a χ 2 function comparing the theoretical predictions and experimental measurements of various physical observables. In this regard, taking into account correlated and uncorrelated measurement uncertainties correctly using a covariance matrix has an important role in the extraction of the PDFs uncertainties. To be more precise, if m i is the theoretical prediction corresponding to a data point µ i for which individual sources of its correlated uncertainty are not available, the χ 2 function can be expressed in the following form: But for other data points, which include the full correlated error information, it is defined as: where m i + Ncorr k=1 r k σ corr k,i are the data values allowed to shift by some multiple r k of the systematic error σ corr k,i in order to give the best fit. The last term is called the penalty term. As a last point, the Hessian (eigenvector) method [65] has been used for propagating experimental uncertainties to PDFs. Actually, one of the best features of the HERAFitter program is that the Hessian method can be implemented through it automatically.

Results
Following the phenomenological framework of our QCD global analysis introduced in the previous section, we are ready now to perform the desired analyses to study the impact of D0 and EMC data on the behaviour of PDFs. For this aim, we perform three analyses as follows. In the first analysis, we consider a fairly standard selection of data related to the different observables and experiments, such as the new high precision HERA I+II combined data or W and Z production data at the LHC and Tevatron, which are necessary to constrain the PDFs. In the second analysis we include the D0 data to assess their impact on the PDFs. Finally, in the third analysis, we substitute the D0 data with the EMC data to also study their effects on the PDFs. By comparing the results of the last two analyses with each other, we can answer the question of whether or not they have same effect on the final results.
The list of experimental data and also the results of our global analyses introduced above are summarized in Table. 1. Looking in more detail, the results of the base analysis that includes neither D0 nor EMC data are presented in the column labelled "No EMC & No D0". The columns labelled by "+EMC" and "+D0" contain the results of the second and third analyses, which include the D0 and EMC data, respectively. Note that the values of χ 2 and the number of data points are also presented for each data set. The values of the total χ 2 divided by the number of degrees of freedom for the above-mentioned fits are given in the last row of the table. This is equal to 1.25 for the "No EMC & No D0" fit where we have N = 2056 data points. It is equal to 1.37 and 1.43 for the "+EMC" and "+D0" fits. It was indicated in Ref. [23] that there is a significant tension between the EMC data and other experimental data sets. For that reason, the EMC data are not usually included in most global PDF analyses. The tension of the EMC data with other data sets is also visible in our analysis. From Table 1, we can see that by including the EMC data, the χ 2 of HERA1+2 NCep 920 [24] and the BCDMS [37] data increases. An interesting point is that we have same situation by including D0 data in the "+D0" analysis. Note, however, that in contrast with the HERA1+2 NCep 920 and BCDMS data, the SLAC data [39] indicate the opposite behaviour. This compatibility between SLAC data with EMC and D0 data might be an indication that the SLAC data are sensitive to the IC component as demonstrated in Ref. [23]. Furthermore, including the D0 data leads to a significant decrease in the χ 2 of the D0 W [46] and CDF W asymmetry data [48], but to the contrary, an increase in the χ 2 of the CMS W muon asymmetry [54] and ATLAS jets data [49,50].
The optimal values of the input PDF parameters of Eq. (2), for each of the three fits, are given in Table 2. The parameters related to the gluon and sea distributions have been changed dramatically by adding the D0 and EMC data. By referring to Tables 2 and 1, one can see that the effects of these data are not quite the same. We discuss this issue in the following, when we investigate the PDF behaviour. Figure. 2 shows a comparison between the fit results and D0 data as a function of p γ T . In this figure, the ratios of data over the NLO QCD predictions are also presented. The fit results are in good agreement with the NLO QCD predictions within the theoretical and experimental uncertainties in the region of 40 p γ T 140, but show disagreement for larger p γ T . Actually, in this area, the data are far from the NLO QCD prediction. Part of this discrepancy may be due to the lack of higher-order perturbative QCD corrections, which are dominated by the annihilation process qq → γg (with g → cc) for high p γ T . On the other hand, it might also be evidence for the existence of a non-perturbative charm component in the proton. In Ref. [19], it has been shown that considering the IC contribution reduces the discrepancy between theory and data but not fully resolve it, even with considering a 3.5% IC contribution. According to the obtained results, the study of the PDFs behaviour in these analyses can be very interesting and instructive. Figure 3 shows the extracted PDFs at the starting scale Q 2 = 1 GeV 2 , as a function of x, for "No EMC & No D0", "+EMC" , and "+D0" analyses. Surprisingly, the D0 and EMC data show a similar impact on the PDFs. For example, xg(x, Q 2 ) of both "+EMC" and "+D0" analyses, indicates an increase in the vicinity of x ≈ 0.3, where IC is dominant. Actually, regardless of the χ 2 value, the inclusion of the D0 and EMC data in the standard fit causes the gluon and sea distributions to change from the default quite significantly, as shown in Fig. 3. Since the gluon and dynamical charm are extremely correlated, the gluon distribution becomes much smaller in the region x = 0.05 in order to provide less dynamically generated charm in this region, and limit the overshoot of the D0 and EMC data. The gluon also increases at very high x, which helps to fit the D0 and EMC data (which tend to lie above the theoretical predictions) at larger values of p γ T and x, respectively. However, it is also a consequence of the momentum sum rule for PDFs, i.e. a decrease in the gluon at x = 0.05 leads to its increase at high x automatically. The change in the gluon is not preferred by the HERA neutral current 920 GeV data and BCDMS data, and the χ 2 increases for these, but is seemingly preferred by SLAC data. On the other hand, note that the valance distributions in different analyses do not change significantly in contrast to the gluon and sea distributions.
At high Q 2 , the results of various analyses lie within each other and their differences cannot be observed. Therefore, in Fig. 4, we have also illustrated the relative PDF uncertainties at the scale Q 2 = 8317 GeV 2 , as a function of x for the gluon and charm distributions, which have been extremely affected by including  the EMC and D0 data. According to our results, shown in Figs. 3 and 4, the impact of D0 data on the PDFs is similar to the EMC data but more intense than it. It seems that these data allow more probability for an IC contribution within the proton, but in fact it is difficult to put a limit on the probability of IC using D0 data, because part of this discrepancy may be due to a lack of contributions from higher order corrections of the qq channel, which becomes important at the Tevatron at large transverse momenta. On the other hand, the qq channel does not play an important role at pp colliders; therefore, the difference between theory and data for measurements of this process at the LHC can be lighter than this situation and more useful to investigate the IC probability in the proton.

Conclusions
Since the theoretical descriptions of both D0 and EMC data are not satisfactory, respectively, at large values of the photon transverse momentum p γ T and Bjorken variable scale x, they can be recognized as the same evidence for the inefficiency of perturbative QCD in dealing with the charm production in these kinematic regions. In addition, since they both show some evidence for the existence of non-perturbative charm in the proton, one can predict that these data should have the same impact on the PDFs. In the present study, we investigated the impact of the differential γ+c-jet cross section data from D0 on the PDFs by performing a NLO global analysis. We indicated that including the D0 data leads to a dramatic shift in gluon and sea distributions. Furthermore, the inclusion of these data in the analysis causes the χ 2 of the more precise measurements from HERA and also BCDMS data to increase. The D0 data also indicate that they cannot be fitted in a satisfactory way by just considering perturbative charm. This is similar to the EMC data as indicated in Ref. [23], where a tension between EMC and HERA data was reported. Due to this problem many global PDF analyses omit these data from their fits. Moreover, in this study we investigated the impact of older EMC measurements of charm structure function F 2 c on the PDFs and compared the results with those from the analysis of Tevatron data. We showed that both D0 and EMC data have the same impact on the PDFs. It also illustrated that the impact of D0 data on the PDFs is greater than that of EMC data. Although this difference allows more IC probability, these data are not appropriate to put a limit on the IC contribution in the nucleon due to the lack of higher order corrections in calculation of the differential γ+c-jet cross section. We argued that the higher order corrections for γ + c-jet and Z + c-jet measurements in pp collisions at the LHC are lighter and a QCD fit including these data by considering the IC contribution can be highly instructive.