Unveiling saturation effects from nuclear structure function measurements at the EIC

We analyze the possibility of extracting a clear signal of non-linear parton saturation effects from future measurements of nuclear structure functions at the Electron-Ion Collider (EIC), in the small-x region. Our approach consists in generating pseudodata for electron-gold collisions, using the running-coupling Balitsky-Kovchegov evolution equation, and in assessing the compatibility of these saturate pseudodata with existing sets of nuclear parton distribution functions (nPDFs), extrapolated if necessary. The level of disagreement between the two is quantified by applying a Bayesian reweighting technique. This allows to infer the parton distributions needed in order to describe the pseudodata, which we find quite different from the actual distributions, especially for sea quarks and gluons. This tension suggests that, should saturation effects impact the future nuclear structure function data as predicted, a successful refitting of the nPDFs may not be achievable, which would unambiguously signal the presence of non-linear effects.


Introduction
It is foreseen that the next high-energy nuclear physics facility will be an Electron-Ion Collider (EIC) [1]. One of the main physics goals of this future QCD laboratory will be to unambiguously unveil the onset of the so-called gluon saturation regime of QCD. This regime of hadronic and nuclear wave functions at small longitudinal momentum fraction x is characterized by a transverse momentum scale, the saturation scale Q s (x), at which non-linearities become of comparable importance to linear evolution [2]. The emergence of this non-linear regime is a fundamental consequence of QCD dynamics, and it has been subject to steady theoretical progress for the past twenty years. However the general consensus is that a clear experimental discovery remains elusive.
Saturation phenomenology has been successful in all the (admittedly small) phase-space corners where that physics is expected to be relevant (i.e., for every collider process that involves small-x partons and transverse momenta of the order of the saturation scale), for a broad range of observables and various collision systems, including d + Au collisions at RHIC and in e + p collisions at HERA [3,4]. The strongest hint in favor of saturation so far is arguably the suppression of the away-side peak of the di-hadron correlation function at forward rapidities in central p + A collisions versus p + p collisions [5,6], discovered at RHIC after it had been predicted [7]. Nevertheless, the global belief in the community is that a direct, unquestionable evidence is still missing, and that the EIC may provide it.
Given this past success, there is little doubt that saturation effects will be at play in e + Au collisions at the EIC. The question is which is the best way to uncover them, which are the golden observables and smoking-gun measurements. Measurements from which one can undoubtedly conclude that non-linear effects are clearly needed on top of standard linear perturbative QCD dynamics, but also from which one can thoroughly verify that those non-linear effects are under theoretical control, i.e., they are described by non-linear QCD equations as opposed to being fully of non-perturbative origin. In this work, we shall focus on the F Au 2 and F Au L structure functions at small x, whose combined measurement is often mentioned as the primary candidate to provide a smoking gun in favor of parton saturation [8]. They provide access to the distributions of quarks and gluons, respectively, and can be extracted from measuring the total e+A → e+X cross-section at different collision energies.
However, even if saturation effects would play an important role in the kinematical range of the future EIC data, there is a possibility that such effects would go unnoticed: standard QCD fits of parton distribution functions (PDFs), which are based on the linear DGLAP evolution equation and therefore do not take into account saturation effects, could still be able to describe structure function data inside the saturation regime [9]. This is due to the large amount of freedom allowed in this approach for the initial conditions to the QCD evolution: those can be artificially suppressed in fits, in order to compensate for an evolution that would be too fast, as would be the case if the DGLAP equation was used into the saturation regime without the proper non-linear corrections.
In other words, up to a certain point, it is possible to obtain a successful fit of F 2 and F L data (and a resulting PDF set) with an incomplete evolution equation, which can unfortunately lead to wrong conclusions and an inaccurate description of the parton content. The question we set out to answer in this letter is whether or not saturation effects, as well as the lever arm in x, will be large enough in order to ensure that this will not happen with gold nuclei at the EIC, and that the combined measurement of the F 2 and F L structure functions is indeed a golden observable. We note that the possibility of detecting nonlinear effects with inclusive measurements has been investigated in earlier works [10,11,12]; the present work utilizes of a novel approach.
Our strategy is the following. We first generate F Au 2 and F Au L pseudodata, with saturation effects setting in according to the latest predictions based on the running-coupling Balitsky-Kovchegov (rcBK) evolution [13,14,15,16,17,18]. Then we answer the question whether various nuclear PDF (nPDF) fits on the market (namely 1 EPS09 [19] and DSSZ [20]), which are well constrained by large-x large-Q 2 existing data, can accommodate our EIC pseudodata. This is done using a Bayesian reweighing procedure which addresses the compatibility of new data (and the quantitative modifications they might induce) to an existing set of PDFs [21,22,23,24,25,26,27]. If the answer is positive, then saturation effects at the EIC would stay concealed; if the answer is negative, then nonlinear QCD dynamics would be needed to describe the data. In the former case, measuring deep inelastic scattering (DIS) off heavy nuclei would still be crucial in order to extract nuclear modifications of quark and gluon distributions at generic values of x, but the structure functions would not form smoking-gun measurements for parton saturation.
The plan of the letter is as follows. In section 2, we detail the saturation framework used to generate our structure-function pseudodata for gold nuclei at the EIC. In section 3, we explain the most relevant features of the the Bayesian reweighing procedure used to check the compatibility of our pseudodata with the nPDFs sets. The obtained results are the topic of Sec. 4. Finally we summarize our findings in Sec. 5.

Nuclear structure function pseudodata from the rcBK saturation model
It is customary to write the DIS structure functions F 2 and F L as follows: where σ γ * A L,T represents the γ * +A total cross-section, for a transversely (T) or longitudinally (L) polarized virtual photon. In turn, at small x, these are obtained in the following way: where Ψ γ * L,T (z, r; Q 2 ) 2 represents the probability for a photon of virtuality Q 2 to produce a qq pair of transverse size r and z denotes the fraction of longitudinal momentum carried by the quark. dσ A dip /d 2 b stands for the total cross-section for this qq pair to scatter off the target nucleus at an impact parameter b and contains the QCD dynamics. By contrast, Ψ γ * L,T is well known from QED. To compute the qq dipole-nucleus cross-section, we follow the approach of [28,29]. Using the optical theorem, the total cross-section can be written where the S-matrix element S A represents the probability amplitude for the dipole to not interact with the target nucleus. Then, the essence of the model is to write that the dipole scatters independently off the different nucleons. Introducing the coordinates of the individual nucleons We further assume that the positions of the nucleons {b i } are distributed according to the Woods-Saxon distribution T A (b): which is normalized to unity d 2 b T A (b) = 1. The nuclear radius R A and surface diffuseness d are measured from the electric charge distribution, their values can be found in Ref. [30]. In this model, the qq dipole-nucleus cross-section can therefore be written which is well approximated by [29] dσ The expression in parenthesis in (5) can be further replaced by exp −AT A (b)σ p dip /2 for large A. The parameters of the model come from the Woods-Saxon distribution (3) and from the AAMQS fit [33] of the qq dipole-nucleon cross-section σ p dip to the reduced cross-section data from HERA, which we shall detail now.
In the AAMQS fit, the dipole cross-section is expressed in terms of an impactparameter independent dipole scattering amplitude N , which contains the smallx QCD dynamics: We note that other models could be used here, in particular models that implement a non-trivial impact parameter dependence already at the level of dipolenucleon cross section, however, this does not lead to important differences for the inclusive observables considered in this work. It would in the case of diffractive and exclusive observables, but this goes beyond the scope of this paper.
The dipole scattering amplitude N obeys the rcBK equation [13,14,15,16,17,18]: with Y = ln(x 0 /x). Using Balitsky's prescription [17], the kernel in (7) reads where r 2 = r − r 1 and v ≡ |v| for two-dimensional vectors. Following [31], the running coupling is regulated in equations (7) and (8) by freezing it to a constant value α f r s = 0.7 in the infrared. The initial condition for the evolution is the so-called McLerran-Venugopalan (MV) model [32]: where Q 2 s0 is the initial saturation scale, and where Λ = 0.241 GeV. All the parameters can be found in [33], we shall make use of the "e" fit of that reference. Recently, more advanced fits have appeared where Eq. (7) is also supplemented with collinear resummations [34]; however those are not yet available for public use, but could as easily be used in the future. With all the ingredients described above, we can generate the central values of our pseudodata for F Au 2 and F Au L . Concerning the projected error bars, they have been generated for us by the "BNL Task Force for the EIC" [35]. As in many EIC dedicated studies, an integrated luminosity of 10f b −1 was assumed, implying that systematic uncertainties dominate. The latter are estimated to be around 3% for F 2 , and for F L , three centre-of-mass energies ranging from 30 to 90 GeV were assumed for the extraction, yielding bigger errors. In Figs. 1 and 2, we show a comparison of those pseudodata, with calculations obtained from the EPS09 and DSSZ nPDF sets, with x ranging from 10 −5 to 10 −2 , with Q 2 -values within the perturbative QCD range. These theoretical curves are the result of incorporating the nuclear PDFs into the next-to-leading order code for structure functions from Ref. [36], meaning that we are using the same MSTW proton baseline for both our EPS09 and DSSZ nPDF calculations.
In the case of F 2 (Fig. 1), the pseudodata at the highest x-values are well within the theoretical uncertainties for both nPDF sets considered. In contrast, for x 10 −3 the rcBK prediction lies in the upper limit of the nPDFs uncertainties or even above them. This is due to the fact that nPDFs fits are, at the moment, not constrained at such low x-values, and the theoretical predictions shown on the figures rely on extrapolations. Moreover, contrary to what one could have naively expected, the nPDF extrapolations induce F 2 values which are smaller than what the saturation model predicts, not larger. This means that, with the current nPDF sets, one should not expect the golden scenario in which the collinear factorization predictions will overshoot the data at small x, due to the lack of non-linear effects in that framework.
With respect to the longitudinal structure function, there are two main reasons that explain the extreme differences seen in Fig. 2. On the one hand, as before, part of the generated pseudodata lie in an unexplored kinematical region and thus the nPDF extrapolations are not reliable. On the other hand, F L is sensitive to the gluon distribution already at the lowest order, unlike F 2 . This gluon density is not well determined, as the bulk of data considered in nPDFs extractions is not sensitive to it, either due to the kinematical range covered or to the observables themselves.
All these reasons come together to explain the level of disagreement between the saturated pseudodata and the collinear factorization predictions. And to illustrate that these discrepancies relate to nuclear effects, and not to differences at the level of the proton structure functions, we show in Fig. 3 that indeed the AAMQS and the collinear-factorization calculations are compatible, as they should since these are build upon the same HERA data.
Finally, it is noteworthy that the upper left panel in both Figs. 1 and 2, corresponding to the smallest Q 2 -value, only has a prediction for DSSZ, as it is outside the validity range of EPS09. Given this, the total number of pseudodata included in the analysis below is not the same for each nPDF set. We would also like to point out that the projected error bars for the F L measurements at the EIC are much bigger than those in the case of F 2 , and as a consequence including our F L pseudodata in the analysis makes little difference.

Bayesian reweighting of nuclear PDFs
The partonic densities (PDFs) are a necessary piece for the theoretical predictions of physical observables for processes involving at least one hadron in the initial or final state. However, their determination from experimental data is an involved and time consuming procedure. For this reason, updating the PDFs every time a new set of data becomes available is rather vexing, as a priori one can't know if new information would be obtained from it. Therefore, statistical methods have been developed in order to bypass this obstacle: the reweighting techniques are tools that allow to incorporate information from newly measured data into a set of PDFs without recurring to the standard procedure of global fitting. At least two methods are now available: the Hessian reweighting and the Bayesian reweighting [21,22,23,24,25,26,27]. While in this work we chose the latter, it has been shown that they are equivalent for PDF sets with theoretical uncertainties determined by the Hessian method as the ones we use.
Let us assume that we have the representation of the underlying probability distribution P old (f ) of the PDFs given by a large ensemble of PDFs f k , k = 1 . . . N rep . Then the expectation value for any quantity O depending on the PDFs can be computed as with variance If new data y ≡ {y i , i = 1, ..., N data } with covariance matrix C ij are made available, we can update P old (f ) incorporating the information contained in the new data by use of the Bayes theorem as where P( y|f ) is the likelihood for the new data given the original set of parton densities. With this modification the quantities defined in equations (10) and (11) turn into weighted averages with the weights ω k proportional to the likelihood. The adequate selection of the likelihood is a delicate matter and several options have been proposed [26,21,22,23]. In particular, we follow the option of [27], equivalent to a refit for PDF sets with uncertainties based on the Hessian method (with N eig eigenvalues and fixed tolerance ∆χ 2 ): where the theoretical values y i [f ] are estimated by with the deviation between data and each replica k computed as An interesting feature of the Bayesian method is the existence of a quantitative estimator of the agreement between the data originally considered for the PDF fit and the new one. This estimator, the effective number of replicas N eff , is defined as N eff N rep points to a large tension between data sets, either due to incompatibility or too much new information in the new data. N eff ≈ N rep instead hints at a strong compatibility and the use of the new reweighted PDF set reliable. Nevertheless, the real meaning of N eff can be less clear when comparing different PDFs sets, as was pointed in [37].
The most general version of the procedure, applicable to any process with at least one hadron (or nucleus) in the initial or final state, starts by generating the replicas f k by with f S0 and f S ± i the PDFs for the central fit and eigenvectors, respectively, and R ik random numbers with Gaussian distribution centered at zero and with variance one. Once these are calculated for a high enough amount of replicas (N rep > 10 3 ), they are used to compute the theoretical values of the observable needed for each replica. For most of the studied quantities it involves a large quantity of time consuming convolutional integrals and this goes in detriment of the reweighting positive feature of speed. However, should the PDF set under study enter the computation linearly (as in the present case), it is possible to alter the order of the steps and save time. Let us consider the electron-nucleus collision we are investigating. For the k − th replica, the observable can be schematically written as where ⊗ denotes the convolution of the hard partÔ with the PDF, and the sum over the partonic species is implicit. Using Eq. (19) to replace f A k , we end up with By distributing the terms we have where O S0 is the observable obtained with the central set, and O S ± i the ones corresponding to the eigenvectors. Thus we avoid computing the observable N rep times and only do so 2N eig + 1 times.  Table 1: Results of the reweighting process: pseudodata taken into account (first column), nPDF considered (second column), χ 2 per number of pseudodata points before and after the reweighting (third and forth column, respectively), and effective number of replicas remaining (fifth column) of the Nrep = 10 5 initial replicas.

Results
Given the pseudodata of Figs. 1 and 2 for e + Au collisions (A = 197), we perform a Bayesian reweighting using an initial number of replicas N rep = 10 5 . For the sake of making better sense of the results, we perform the analysis three times, considering only F 2 , only F L and both F 2 and F L pseudodata together. The results are shown on Table 1, which summarizes the quantitative estimators of the adequacy of the nPDF description.
If we look at the central column of Table 1, we see that the χ 2 is always larger for DSSZ than for EPS09. This is because, as mentioned before, the pseudodata in the upper left panel of Fig. 1 and 2 is only included in the analysis with the DSSZ set. This fact is specially relevant in the case of F L , since the discrepancy between pseudodata and theoretical curve is larger than for F 2 . Regarding F 2 , the deviation of the pseudodata from the central predictions combined with the small uncertainties give huge contributions to the χ 2 (central column). As for F L , we have to distinguish between nPDFs, in spite of the description not being good in either case. The error bars are much larger than for F 2 , so this yields a total χ 2 per number of points lower for EPS09. Nevertheless, for DSSZ, the inclusion of the upper left panel of Figure 2 means a towering increase of χ 2 , despite the larger error bars.
These χ 2 determine the weights according to equation (15) which in turn give us the total number of meaningful replicas (N eff ) that we see in the far right column of Table 1. If we compare N eff with the total number of replicas N rep , we can see, on one hand, that for F L most of the replicas survive (for EPS09, the practical totality of the replicas survive), noting that this pseudodata is compatible with the current nPDF fits. This does not come as a suprise, given the huge error bars of the F L pseudodata. On the other hand, considering both F 2 and F L , we can see that less than 15% of the replicas survive the reweighting in the case of EPS09. This number shrinks to less than 2% if we consider the DSSZ set. This means that there is a big tension between the pseudodata and the theoretical predictions, and a new fit should be mandatory.
Nevertheless, we can study what happens with the few remaining replicas and what nuclear partonic behavior is favored by the pseudodata. The reweight-ing process affects the nPDFs in the following way: -In the case of EPS09 (Fig. 4), the reweighting suppresses minimally the central value of the valence at low x (left panel), slightly increasing the shadowing in that region while the uncertainty is shifted up just a little bit. For the sea distribution (central panel) the central value increases and the shadowing suppression gets smaller, with the uncertainties shrinking dramatically. Finally the gluon density (right panel) flattens and the strong shadowing/anti-shadowing that characterizes this fit smoothes so much that leaves a curve almost compatible with unity.
-In the case of DSSZ (Fig. 5), the reweighting returns a valence distribution with stronger shadowing and anti-shadowing regions, a sea distribution with an enhancement of the central value in the shadowing area and a deeper EMC-effect. The reweighted gluon distribution behaves the same way as the reweighted sea, but the shadowing region for the gluon goes over unity, becoming, in fact, anti-shadowing. All these distributions get narrower uncertainties after the reweighting.
The curious behavior at low x produced by the pseudodata originates from the ansatz in the initial parameterizations. In EPS09, the limit of R for x → 0 is a parameter, while for DSSZ there is no such constrain. Once obtained a fit, any prediction outside the kinematical range probed comes from an extrapolation, thus allowing for the puzzling and unrealistic curves at small x of Fig. 5 to occur.

Summary
On the search for gluon saturation we have analysed the impact of future nuclear structure function low-x data at the EIC. Our aim was to assess whether or not the collinear factorization approach with nuclear PDFs will be able to fit the data, if saturation sets in according to current expectations. As it is customary for nuclear PDFs, in this study we used the structure function F 2 .  In addition, we have also considered the longitudinal structure function F L , as it is much more sensitive to the gluon density and thus potentially capable of providing interesting information, although with hindsight we found it not to be the case due to the large projected errors.
We generated saturated pseudodata for F Au 2 and F Au L , with central values obtained from the rcBK predictions and error bars estimated taking into account the design parameters the future detectors, and compared them with the corresponding collinear factorization predictions. The latter, that rely on extrapolations as most of the points lie outside the kinematical range explored in the original fits, lead to smaller structure functions than what the saturation model predicts at small x. This is particularly true for F L , as the extrapolation for the nuclear gluon distributions at low x are very unreliable. The quantitative impact of the pseudodata on the nPDFs was obtained by means of a Bayesian reweighting technique. The results look quite different from the original distributions, especially for sea quarks and gluons. This strong tension is confirmed by the numbers in Table 1. The effect is bigger in the case of DSSZ for which the gluon parameterization lacks flexibility, but numerical estimators confirm the existence of a non trifling tension also in the case of EPS09.
Our results suggest that, should the EIC provide data compatible with the expected theoretical description from the saturation model studied (or a similar one), a successful refitting of the nPDFs may not be achievable, which would unambiguously signal the presence of non-linear effects. However, in order to be fully conclusive and determine whether or not genuine saturation effects can be unveiled from nuclear structure function measurements, performing a new global nPDF fit will be necessary. At the moment we can not exclude the possibility of a successful refitting of the nuclear PDFs because the nuclear gluon distribution is currently essentially unconstrained at small-x. In that case, one would have to resort to diffractive observables in order to pin down saturation effects at the EIC (see e.g. [38,39,40]).