Electron-Ion Collider impact study on the tensor charge of the nucleon

In this letter we study the impact of the Electron-Ion Collider (EIC) on the phenomenological extraction of the tensor charge from a QCD global analysis of single transverse-spin asymmetries (SSAs). We generate EIC pseudo-data for the Collins effect in semi-inclusive deep-inelastic scattering for proton and $^{3\!}He$ beams across multiple center-of-mass energies. We find a significant reduction in the uncertainties for the up, down, and isovector tensor charges that will make their extraction from EIC data on SSAs as precise as current lattice QCD calculations. We also analyze the constraints placed by future data from the proposed SoLID experiment at Jefferson Lab, discuss its important complementary role to the EIC, and present the combined impact from both facilities.

Moreover, there was a recent global analysis of SSAs performed in Ref. [13] (JAM20), which included not only Collins effect SIDIS and SIA data but also Sivers effect SIDIS and Drell-Yan measurements [24,[26][27][28][29][47][48][49][50] as well as proton-proton A N data [51][52][53][54]. The JAM20 results found for the first time an agreement between experimental data and lattice QCD (without including lattice data in the fit) for δu, δd, and g T , as calculated in Eq. (1). The crucial aspect that allowed for such an agreement was the inclusion of A N data. This observable is collinear twist-3 [55][56][57][58][59][60] and dominated by a term that couples h 1 (x) to the quark-gluon-quark FFs H ⊥(1) 1 (z) andH(z) [13,61,62]. The function H ⊥(1) 1 (z) is the first moment of the Collins TMD FF, andH(z) generates the P hT -integrated SIDIS A sin φ S UT asymmetry, where P hT is the transverse momentum of the hadron w.r.t. the momentum of the virtual photon, by again coupling with h 1 (x) [63].
Furthermore, the JAM20 results, due to the inclusion of A N data, also give the most precise phenomenological extraction of g T to date: g T = 0.87 (11). Nevertheless, the error in g T , along with those for δu, δd (JAM20 values are δu = 0.72 (19), δd = −0.15 (16)) are still much larger (∼ 12% for g T , ∼ 25% for δu, and ∼ 100% for δd) than the uncertainties from lattice QCD calculations ( 5% for all of δu, δd, and g T ) [18,20,21]. The main cause of the uncertainty for phenomenological computations is that they rely on integrals of h 1 (x) over the entire x region from 0 to 1 (see Eq. (1)). However, current SIDIS measurements only cover a region 0.02 x 0.3. This leaves the transversity PDF basically unconstrained in the small-x and large-x regimes. The inclusion of A N data does give some further constraints in the larger-x region since in that observable one integrates from x min to 1, where 0.2 x min 0.7. We also mention that a first principles calculation of the small-x asymptotics of the valence transversity TMD PDF has been performed in Ref. [64]. Nevertheless, one clearly needs very precise data at both x 0.02 and x 0.3 in order to significantly reduce the uncertainties in phenomenological extractions of the tensor charge.
The future Electron-Ion Collider (EIC) [65,66] at Brookhaven National Laboratory will make the most precise SIDIS measurements at small x (down to x ∼ 10 −4 ) while also increasing the precision of the data in the region up to x ∼ 0.5. The 12 GeV program currently underway at Jefferson Lab (JLab) [67] will make precision measurements up to x ∼ 0.6 and smaller values of Q 2 . In terms of the tensor charge, future data from the proposed SoLID experiment at JLab [68,69] will offer substantial constraints in this region. Therefore, we separately assess the impact of SIDIS Collins effect pseudo-data from SoLID (for the "enhanced" scenario) [70]. We also note that precision measurements from Belle-II on the Collins effect in SIA will affect extractions of the tensor charge due to reducing the uncertainties in the Collins TMD FF [71]. The goal of this letter is to perform an impact study of future EIC data on the tensor charge of the nucleon using the JAM20 results as a baseline. In Sec. 2, we discuss the EIC pseudo-data used in the analysis. This includes both proton and 3 He beams across multiple center-of-mass (CM) energies. In Sec. 3, we include these pseudo-data in the global analysis of Ref. [13] and, from the newly extracted transversity PDF, compute the tensor charges δu, δd, and g T and compare them to those of recent lattice QCD calculations. In Sec. 4, since the proposed SoLID experiment itself would give significant impact on the tensor charge at large x, we perform a similar analysis on its pseudo-data [70]. We also discuss the important complementary role of SoLID to the EIC if one is to obtain an accurate and, as much as possible, unbiased phenomenological extraction of the tensor charge. Finally, we summarize our results and discuss the future outlook in Sec. 5.

Generating EIC Pseudo-Data
The EIC will provide data sensitive to the transversity PDF through SSAs in single-hadron and di-hadron reactions. For the former, measurements will be made of the Collins effect A sin(φ h +φ S ) UT in e + N ↑ → e + h + X, where φ h (φ S ) is the azimuthal angle of the outgoing hadron momentum (nucleon transverse spin) vector w.r.t. the lepton scattering plane. We generated EIC pseudo-data for both transversely polarized proton and 3 He beams with charged pions detected in the final state and applied the JAM20 cuts of 0.2 < z < 0.6, Q 2 > 1.63 GeV 2 , and 0.2 < P hT < 0.9 GeV. Table 1 summarizes the data used in our fit, which includes a total of 8223 EIC pseudo-data points on the Collins 25,27] 126 Collins (SIA) e + + e − → π + + π − + X [30][31][32][33] 176 A N p ↑ + p → π ± /π 0 + X [51-54] 60 Total JAM20 N pts. 517 SIDIS plus the 517 SSA data points in the original JAM20 global analysis. The EIC pseudo-data covers multiple CM energies √ S based on the energy of the electron beam E e and nucleon beam E N : The pseudodata was generated with pythiaeRHIC [72] that uses pythia 6.4 [73] as an event generator. Realistic EIC detector acceptances and momentum smearing were implemented via the eic-smear package [74] and is predominantly based on the expected resolutions that are discussed in the EIC handbook [75]. For pion identification, the momentum and rapidity ranges that evolved from the EIC user group Yellow Report effort [76] were used. The proton and 3 He polarizations were assumed to be 70%, and the uncertainties were scaled to accumulated luminosities of 10 fb −1 for each beam energy sample. In the case of 3 He, it was assumed that the two protons can be tagged in the very forward instrumentation, and was thus simulated by generating e + n ↑ data after taking into account the neutron polarization in 3 He. The uncertainties on the expected SSAs were evaluated by re-weighting the unpolarized simulations based on the phenomenological results of Ref. [77] and extracting the reconstructed asymmetries. As a crude measure of detector smearing and acceptance effects in a real detector, the differences between extracted asymmetries using perfectly tracked and smeared values were assigned as systematic uncertainties. This tries to conservatively mimic the uncertainties that may be related to the unfolding of smearing and particle mis-identification in an actual detector.  [13] (light red band with the dashed red line for the central value) as well as a re-fit that includes EIC Collins effect pion production pseudo-data for a proton beam only (cyan band with the dot-dashed cyan line for the central value) and for both proton and 3 He beams together (blue band with the solid blue line for the central value). (Bottom) Individual flavor tensor charges δu, δd as well as the isovector charge g T for the same scenarios. Also shown are the results from two recent lattice QCD calculations [18,21] (purple). All results are at Q 2 = 4 GeV 2 with error bands at 1-σ CL.

EIC Phenomenological Results
We begin by briefly discussing the methodology of the JAM20 global analysis, which serves as the baseline for our impact study, and refer the reader to Ref. [13] for more details. We employ a Gaussian parametrization for the transverse momentum dependence of the TMD PDFs and FFs. In particular, for the transversity TMD PDF we have with q being a quark flavor, and k 2 T q h 1 the transverse momentum width. Note that k T is the transverse momentum of the struck quark. For h q 1 (x) we only allow q = u, d and explicitly set antiquark functions to zero. Even though an important goal of the EIC will be to constrain the sea quark transversity PDFs, for the tensor charge their inclusion is expected to have a small effect. Lattice QCD finds that contributions from disconnected diagrams to the tensor charge are about two orders of magnitude smaller than connected diagrams [18,21]. In addition, if one assumes a symmetric sea, then antiquark contributions cancel when calculating g T , as one can see from Eq. (1). The Collins TMD FF is parametrized as where z is the hadron momentum fraction, is the transverse momentum width, and M h is the produced hadron mass. Note that p T is the transverse momentum of the produced hadron with respect to the fragmenting parton. We allow for favored and unfavored Collins functions.
The Gaussian transverse momentum parametrizations (2), (3) of JAM20 do not have the complete features of TMD evolution [9,36,[78][79][80] and instead assume most of the transverse momentum is non-perturbative and thus related to intrinsic properties of the colliding hadrons rather than to hard gluon radiation. The JAM20 analysis also implemented a DGLAP-type evolution for the collinear twist-3 functions analogous to Ref. [81], where a doublelogarithmic Q 2 -dependent term is explicitly added to the parameters. Such collinear twist-3 functions arise from the operator product expansion (OPE) of certain transverse-spin dependent TMDs (e.g., H ⊥(1) 1 (z) enters the OPE of the Collins TMD FF [9]). For the collinear twist-2 PDFs and FFs (e.g., f 1 (x), h 1 (x), and D 1 (z)), the standard leading order DGLAP evolution was used. The fact that current data on SSAs can be described with a simple Gaussian ansatz highlights the need for the tremendous Q 2 lever arm of the EIC. The ability to span several decades in Q 2 will help constrain the exact nature of TMD evolution and study the interplay between TMD and collinear approaches.
Our study was conducted using replicas from the JAM20 analysis as priors in a fit of all the data in Table 1 (8740 total points). The results for the impact on the up and down transversity PDF h 1 (x) as well as the Collins function first moment H ⊥(1) 1 (z) are shown in the top panel of Fig. 1. One clearly sees a drastic reduction in the transversity uncertainty band once EIC data is included compared to the original JAM20 results. Even the uncertainties for the Collins function decrease noticeably in the smaller-z region. This will allow for a more stringent test of the universality of the Collins function between SIDIS, electron-positron annihilation, and proton-proton collisions [82][83][84][85][86][87][88]. We emphasize that the 3 He data is crucial for a precise determination of the down quark transversity PDF and for up and down flavor separation, enabling a higher decorrelation between δu and δd. Specifically, the Pearson correlation coefficients were found to be 0.80 for JAM20 , 0.93 for JAM20 + EIC(ep) , 0.043 for JAM20 + EIC(ep+e 3 He) , where · · · is the average value over all replicas, and ∆(· · · ) is the uncertainty (standard deviation) of the calculated tensor charge. (The correlation coefficient ρ can be in the range [−1, 1], where ρ = ±1 indicates 100% correlation (anti-correlation) and ρ = 0 indicates zero correlation.) Moreover, the well-constrained up and down h 1 (x) translate into very precise calculations of δu, δd, and g T , as shown in the bottom panel of Fig. 1. We find all relative errors are now 5%: δu = 0.709 (15), δd = −0.109(5), g T = 0.818 (16). One can see the increase in precision of the extracted δu due to the proton EIC data and further dramatic reduction of errors, in particular for δd and g T , in a combined analysis of proton and 3 He EIC data. From the two lattice QCD calculations at the physical point [18,21] that are also included in that plot, we can conclude that EIC data will allow for phenomenological extractions of the tensor charges to be as precise as current lattice results. Thus, the EIC will provide a unique opportunity to explore the possible tension between these two approaches discussed in Ref. [10].
We have also explicitly shown how the central values for h 1 (x), δu, δd, and g T shift as new pseudo-data are included in the fit. Experimental measurements are related to the extracted functions in a very non-linear manner. The inverse problem of extracting parton distribution and fragmentation functions from experimental data can therefore have multiple solutions. In fact, such a shift is expected when a measurement is performed with a very high precision in a limited kinematical region. The measurement will better constrain parameters describing this particular kinematical region and will, potentially, distort the extracted functions compared to the baseline functions. Thus, a very precise measurement cannot always guarantee a very accurate extraction of the distributions, and multiple experiments, such  as EIC and SoLID in this case, should be performed in a wide kinematical region in order to minimize bias and expose any potential tension between data sets. This point will be discussed in more detail later in Sec. 4. In order to better understand which kinematical regions for the Collins asymmetry are most important to reduce the uncertainties in the extraction of the transversity and Collins functions, we calculate the ratio of the uncertainty in the JAM20 calculation of A sin(φ h +φ S ) UT to that of the EIC pseudo-data. Since the EIC errors need to be smaller than those from JAM20 in order to obtain more precisely extracted functions, the larger this error ratio, the larger the impact of the new data set on the observable. We note that A sin(φ h +φ S ) UT is a function of (x, Q 2 , z, P hT ). Therefore, we define the following average error ratio for each (x, Q 2 ) bin: where the sum runs over all points N bin in a given (x, Q 2 ) bin, including all (z, P hT ) points in that bin for both π + and π − final states. The results for ∆ JAM20 /∆ EIC are shown in Fig. 2 for various CM energies for both proton and 3 He beams. We find that for the proton ( 3 He) beam, the x 0.03 (x 0.001) region has the greatest impact on the JAM20 analysis of this observable. One may ask why more impact is not expected at lower x values for the proton beam. The reason is the x dependence of the PDFs in JAM20 is parametrized as ∼ N x a (1 − x) b , where N, a, b are free parameters. Since current SIDIS Collins effect data are in the moderate x region (0.02 x 0.3), the a values in JAM20 are pretty well constrained. Since at small x, the PDFs ∼ x a , this leads to reduced uncertainties in this regime even though no data is available there. Such parametrization bias is unavoidable. The results of Fig. 2 should not be interpreted as diminishing the relevance of the high energy configuration of the EIC for measuring the Collins asymmetry, but rather as an indication of what region most affects our current JAM20 extraction. Certainly new data in the small x region will influence the value of a and change the inferred shapes of the transversity and Collins functions. In addition, the small x data will reveal a potential sea quark transversity that is not included in our analysis. Both beams also show significant error reduction over several decades of Q 2 , highlighting the importance of the tremendous Q 2 lever arm of the EIC. We clearly see again in Fig. 2 the definite need for the 3 He program at the EIC down to small values of x.

Complementarity of the SoLID Experiment to the EIC
In this section we analyze the impact of pseudo-data from the proposed SoLID experiment at JLab [68,70], compare the results to the EIC case, and discuss the complementary features of these measurements to the EIC. SoLID will cover a region of 0.05 x 0.65 and 1 Q 2 8 GeV 2 . After the JAM20 data cuts, our study included 526 points for e + p ↑ → e + π ± + X (311 for π + and 215 for π − ) and 696 points for e + 3 He ↑ → e + π ± + X (412 for π + and 284 for π − ). The SoLID experiment will use both 8. times for both energies and targets, the accumulated luminosities will far exceed the EIC luminosities [70]. Both EIC and SoLID will be systematics limited in most of their covered kinematical ranges. In the left panel of Fig. 3, we show the coverage in x and Q 2 of the EIC and SoLID pseudo-data. One can see that both facilities cover complementary kinematical regions, i.e., the region of large x and relatively low Q 2 for SoLID, and a wider region of x, reaching the low values associated with sea quarks and gluons, and large values of Q 2 for the EIC. Therefore, for the large-x region the data obtained by SoLID will be important for the detailed exploration of the non-perturbative nature of TMD functions. The EIC will contribute a substantial Q 2 range in the same kinematical domain, which will allow one to study the effects of QCD evolution of TMD functions as well as to constrain them in a wider x range. In addition, these studies will be important for understanding the influence of higher twist corrections, target and produced hadron mass corrections, and the applicability region of TMD factorization. We also plot in the right panel of Fig. 3 the quadrature of the expected statistical and systematic errors of the EIC and SoLID pseudo-data for A sin(φ h +φ S )

UT
. One can see that on average the SoLID pseudo-data will be more precise at larger x due to its higher luminosity.
In Fig. 4, we present T is the following truncated integral: We want to study the impact on this quantity only from new data in the region x > x min and eliminate the influence from data with x < x min , which could cause an artificial decrease in uncertainties outside the measured region (cf. the discussion about parametrization bias in connection to Fig. 2). Therefore, g [x min ] T for JAM20+EIC and JAM20+SoLID is calculated from fits that only include pseudo-data with x > x min . We see that the error ratio ∆/∆ JAM20 increases significantly as one moves towards the edge of the measured region of x (∼ 0.5 − 0.6). As seen in Figs. 2, 3, the EIC still provides coverage around x ∼ 0.5 with reduced errors from the current JAM20 analysis that at low Q 2 are similar to SoLID. Consequently, the EIC is competitive with SoLID for constraining the contribution to g T from this region. However, at the very edge of the x phase space (x ∼ 0.6), where the applicability of the QCD factorization implemented in this letter is yet to be explored, SoLID maintains a reduction in the errors compared to JAM20, whereas the EIC shows no improvement. From Fig. 4, we also see that the current JAM20 result only constrains the tensor charge down to x ∼ 0.1, which accounts for about 75% of the total g T . Thus, one clearly needs the small-x data at the EIC to fully and precisely determine g T , as Fig. 4 highlights. One also notices that g [x min ] T begins to saturate around x ∼ 0.01, suggesting that very little tensor charge exists at small x. This observation is consistent with the calculation in Ref. [64] of the small-x asymptotic behavior of the valence transversity TMD PDF. However, we note that EIC data in the low x region will be needed for the study of the sea quark transversity functions.
To further compare the EIC and SoLID results, as well as the combined impact from both experiments, in vs. x min for the JAM20 global analysis [13] (red points) as well as a re-fit that includes Collins effect pion production pseudo-data (proton and 3 He together) with x > x min from SoLID (green points) and from the EIC (blue points). The plot also contains two recent lattice QCD calculations [18,21]. Note that these lattice data points are for the full g T integral (i.e., x min = 0) and have been offset for clarity. Also shown is the ratio ∆/∆ JAM20 of the uncertainty in g for the re-fit that includes pseudo-data from SoLID (green squares) and for the one that includes pseudo-data from the EIC (blue circles) to that of the original JAM20 fit [13]. That is, ∆ in the numerator is either ∆ JAM20+SoLID (for the case of the green squares) or ∆ JAM20+EIC (for the case of the blue circles). All results are at Q 2 = 4 GeV 2 with error bars at 1-σ CL. greater decrease. For h d 1 (x) we find at larger x that SoLID, due to its high luminosity and excellent capabilities with a 3 He target, achieves a greater reduction in the relative uncertainty than the EIC. Since the size of h u 1 (x) is greater than h d 1 (x), the relative uncertainty for h u−d shows a similar behavior as that for h u 1 (x). The combined fit of including both EIC and SoLID pseudo-data causes a further decrease in the relative uncertainties for transversity in most kinematical regions.
The Collins FF, since it couples to transversity in the A sin(φ h +φ S ) UT asymmetry, also experiences a decrease in its relative uncertainties for favored and unfavored fragmentation. As previously mentioned, the significant decrease from the EIC for 0.2 < z < 0.6 will allow for a check of the universality of the Collins FF between SIDIS, electronpositron annihilation (with forthcoming measurements from Belle-II [71]), and proton-proton collisions [82][83][84][85][86][87][88]. SoLID also gives a slight improvement at intermediate z for the Collins function first moment from the one extracted in JAM20, with the sharp rise in the relative error around z = 0.3 due to the fact that SoLID put a cut of z > 0.3 on the pseudo-data used for this analysis. The combined analysis of EIC+SoLID is basically identical to the EIC only result. We note generally in Fig. 5 that the rapid increase in the relative uncertainties as one moves towards the edges in x or z is indicative of entering an unmeasured region. The fact that the relative errors are still reduced compared to JAM20 is a consequence of unavoidable parametrization bias, where the impact from regions where new, precise (pseudo-)data are available propagate into kinematics where there is no data.
In Fig. 6, we see a comparison between SoLID and the EIC for δu, δd, and the full g T , as well as for the combined fit that included both EIC and SoLID pseudo-data. We can conclude that SoLID data by itself will also allow for phenomenological extractions of the tensor charges to have similar precision as current lattice results, with relative errors of 7%: δu = 0.68(3), δd = −0.123 (8), g T = 0.80 (3). The JAM20+EIC+SoLID results give the most precise extractions possible of the tensor charges, more precise than current lattice calculations, with all relative uncertainties now 3%: δu = 0.688 (11), δd = −0.123(3), and g T = 0.811 (13). Figure 6 demonstrates the importance of multiple experimental measurements in a wide kinematical region. The global QCD fits performed on the (pseudo-)data demonstrate that quantities such as tensor charge and the precision of the extraction depend on many factors: the precision of the data, the kinematical range of the data, and the flexibility of the model. While the precision of the extraction can be very high, one needs to assure that the accuracy of the results is also very good. By accuracy we mean the distance from the true value of the measured quantity to the extracted one. One can see from Fig. 6 that with the generated pseudo-data, our global QCD analysis results in a very precise extraction of the tensor charges for both EIC and SoLID measurements. However, the 68% CL regions for the individual flavor charges do not overlap. Thus, the precision of the extracted tensor charges may not correspond to the same high accuracy of the result once there are measurements (actual data) from multiple facilities. The reason is an incomplete kinematical region of the experiments and the unavoidable parametrization bias of our extraction. The parametrization bias may be tamed partly by utilizing more flexible parametrizations, such as neural nets. The kinematical coverage of the experiments, on the other hand, is defined by the experimental setup, and it is difficult (if not impossible) to have one experiment cover the whole kinematical region needed for the most accurate extraction. In addition, using data from only one experiment may bias the extractions, as the systematic errors are quite difficult to account for in an unbiased way. Therefore, multiple experimental measurements covering the largest possible kinematical region are needed to achieve a precise and simultaneously accurate extraction of the tensor charge. SoLID will offer needed complementary measurements to the EIC in order to test that a consistent picture emerges across multiple experiments on the extracted value of the tensor charge. Only when a bulk of experiments give consistent central values for quantities of interest, like the tensor charge, can one claim to have accurate results.

Conclusion
In this letter, we have studied the impact on the tensor charge from EIC pseudo-data of the SIDIS Collins effect using the results of the JAM20 global analysis of SSAs [13]. Both transversely polarized proton and 3 He beams are considered across multiple CM energies for charged pions in the final state. We find that the EIC will drastically reduce the uncertainty in both the individual flavor tensor charges δu, δd as well as their isovector combination g T . The 3 He data is especially crucial for a precise determination of the down quark transversity TMD PDF and for up and down flavor separation. Consequently, the EIC, from the combined data in measurements at five different energy settings with transversely polarized proton and 3 He beams, will allow for phenomenological extractions of the tensor charges to be as precise as the current lattice QCD calculations. This will ultimately show whether a tension exists between experimental and lattice data. In addition, we performed a similar study on SoLID pseudo-data of the SIDIS Collins effect to be measured in a complementary kinematical region to the EIC and found that the proposed experiment at Jefferson Lab will also significantly decrease the uncertainty in the tensor charge. The combined fit that included both EIC and SoLID pseudo-data provides the best constraint on transversity and the tensor charges, with the results for the latter more precise than current lattice calculations. We emphasize that a precise measurement cannot always guarantee a very accurate extraction of the distributions, and multiple experiments, such as EIC and SoLID, should be performed in a wide kinematical region in order to minimize bias and expose any potential tensions between data sets. In order to minimize the bias from the global QCD fit procedure, one may ultimately combine the data from different ways of accessing transversity, such as SIDIS single hadron and the di-hadron measurements. Given that the tensor charge is a fundamental charge of the nucleon and connected to searches for BSM physics [14,16,17], future precision measurements from the EIC and Jefferson Lab sensitive to transversity are of utmost importance and necessary to see if a consistent picture emerges for the value of the tensor charge of the nucleon.