1 Introduction

The transverse momentum dependent (TMD) distributions are universal functions that describe the interactions of partons in a hadron. The TMD distributions naturally appear within the TMD factorization theorem for the differential cross section of double-inclusive hard processes. A lot of effort has been made to achieve a comprehensive picture of TMD factorization (for the latest works see [1,2,3,4,5,6,7,8]). In this work we perform a detailed comparison of the experimental measurements with the theory expectations based on our studies of higher-order perturbative expansions and power corrections for unpolarized TMDPDFs made in Refs. [9,10,11,12].

Among many different spin (in)dependent TMD distributions, the unpolarized TMD parton distribution functions (TMDPDFs) play a central role. From the practical point of view, their precise knowledge is required to extract further TMD distributions and perform other precision measurements. The ideal process to study the unpolarized TMDPDFs is the unpolarized vector boson production. The data on the \(q_T\)-dependent cross-section for the Drell–Yan process are collected by many experiments, including the precise measurements done by Tevatron and LHC. The theoretical descriptions of Drell–Yan data were made by many groups using different forms of TMD factorization (see e.g. [8, 13,14,15,16,17,18,19,20,21,22]).

This work presents a number of differences with respect to the previous literature. The collection of the improvements forms a completely new point of view in the TMD phenomenology. The main difference of the present work with respect to the more standard ones (here we consider as the most spread out, and de facto standard, analyses those based on the codes ResBos [15, 23] and DYqT/DYRes [17, 18, 21]) are as follows: (i) We extract the parameters related to individual TMDPDFs, which are suitable for phenomenological description of other TMD-related processes. (ii) We consistently include the perturbative ingredients, such as coefficient functions and anomalous dimensions, at the next-to-next-to-leading order (NNLO), introducing and using the \(\zeta \)-prescription to solve the problem of perturbative convergence at large-b (where b is the transverse distance). (iii) The TMDPDF parameterization is based on and is consistent with the theory expectation on the TMD behavior with b. To our knowledge this is the first attempt to include in a fit both high and low energy data at NNLO precision. The extraction of TMDs takes into account (for the first time to our knowledge) also LHC data. All this represents for us a clear improvement with respect to the more classical analyses.

In a modern view, a TMD distribution is a cumbersome function of many factors, which mix up perturbative and non-perturbative information. In this context, the issue of the separation of perturbative and non-perturbative physics requires a fine analysis and it is open to different solutions. The \(\zeta \)-prescription proposed in this work, is an attempt to consider the perturbative input to a TMD distribution as it is, without artificial regulators. The \(\zeta \)-prescription is founded on the fact that the TMD factorization introduces two factorization scales, one for the collinear and one for the soft exchanges. These scales are usually fixed to the same point, while in the \(\zeta \)-prescription they are chosen to eliminate the problematic double-log contributions. In other words, the \(\zeta \)-prescription is based on the freedom to select the normalization and factorization scales, which is guaranteed by the structure of the perturbative theory. The \(\zeta \)-prescription is essentially different from other used schemes. In particular, it does not strictly solve the problem of the large logarithmic contributions at large-b. It only decreases the power of the logarithmic contributions. However, the x-dependence of the remaining logarithmic terms has a form which prevents the blow up of the perturbative series, which is not accidental, but the result of the charge conservation. In this way, the \(\zeta \)-prescription postpones the large logarithm problem to the very far domain of b-space, where other factors suppress a TMD distribution. The practical implementation of the \(\zeta \)-prescription shows that it is efficacious and it allows a very accurate and sound description of the data.

The description of the non-perturbative parts of TMD distributions is the most interesting task. It is highly non-trivial because the definition of the non-perturbative part is strongly affected by schemes and prescriptions used in the perturbative implementation. In this respect a full NNLO can be clarifying. As an example, we recall that the non-perturbative behavior of the TMDPDFs is often assumed to have a Gaussian shape (see e.g. discussions in [15, 22, 24, 25]). Although the Gaussian ansatz is widely used, it comes into conflict with the usual picture of long-distance strong interaction fueled by light-meson exchanges. The typically expected behavior at long distances is exponential, which is confirmed also by model calculations [26]. However, the Gaussian shape is often introduced together with the \(b^*\)-prescription [27]. Notwithstanding many positive features, the \(b^*\)-prescription has a serious issue: it introduces undesired b-even power corrections. In turn, these power correction introduced by \(b^*\) can easily simulate the Gaussian behavior (see also discussion in [28]). Once the \(b^*\)-prescription is removed the Gaussian ansatz for the TMD shape is no more essential, according to what we find.

An additional remarkable point of the present study is the wide range of energies covered by the data that we have analyzed. The lowest energy measurements included in the fits have \((Q,\sqrt{s})=(4,19.4)\;\text {GeV}\) (E288 experiment [29]), while the most energetic have \((Q,\sqrt{s})=(116-150,8\cdot 10^3)\; \text {GeV}\) (ATLAS collaboration [30]). Typically, the low- and high-energy data are considered separately. The main reason for a separate scan is the assumed physical picture of strong interactions, which describes different energies. The description of the high-energy data requires a precise perturbative input and it is expected to be less sensitive to the fine non-perturbative dynamics. The situation is the opposite for the low-energy measurements. Our experience shows that the inclusion of data of different energies is not only possible within the TMD formalism, but it is also desired because it cuts away inappropriate models very sharply. We find also that the precision achieved by LHC is already sensitive enough to the non-perturbative structure of TMDs. We show that low and high energy data are sensitive to different regions of b-space, and consequently to different non-perturbative regimes of the TMDs: high energy data are better described by a Gaussian non-perturbative correction, while low energy data prefer an exponential type of non-perturbative models. The code (arTemiDe) that we have prepared allows to test all these hypotheses, and can be adaptded also to test different non-perturbative inputs for TMDs.

In order to extract the non-perturbative core of the TMDs, in the present study we choose a neutral tactic. We have scanned many possibilities such as a Gaussian and exponential behavior, with/without inclusion of power corrections, and so on. We have also studied the non-perturbative correction to the evolution kernel. During the examination of models we have prioritized the following criteria:

  1. (i)

    Stability  The TMD factorization is valid at small-\(q_T\) (the dilepton transverse momentum) up to a certain limit. Therefore, an acceptable model should produce a stable and good description within the allowed \(q_T\)-range. In other words, the value of \(\chi ^2\) should be sufficiently close to one and the central values of the parameters should be stable independently of the number of included data points (as far as the points belong to the allowed range).

  2. (ii)

    Convergence  The agreement with data should improve with the increase of the perturbative order. Given the current state of the art of the theory, we can define four successive perturbative orders, which is enough to test the perturbative convergence. Also, the value of the phenomenological non-perturbative constants that one extracts should converge to some central value.

  3. (iii)

    \(\chi ^2\) minimization  Naturally, among the models with similar behavior we select the model with the minimal \(\chi ^2\). We have found that it is difficult to find a model (with one or two parameters), which fulfills the demands (i) and (ii), and that at the same time provides a good \(\chi ^2\) value on the whole set of data points (although it is relatively easy to achieve this, selecting a particular experiment). The models that we test consider a kind of minimal set of parameters which can be enlarged in future studies, refining the fitting hypotheses.

In the present fit, we have included the measurements of E288 at low-energies, Z-boson production at CDF, D0, ATLAS, CMS and LHCb, and Drell–Yan measurements from ATLAS. To our knowledge, this is the largest set of Drell–Yan data points ever simultaneously considered in a fit within the TMD formalism. We find also that the LHC data below the Z-boson peak and at small \(q_T\) are very important for current/future TMD studies. In the article we present the most successful models that we have found, and discuss some popular models.

In order to numerically evaluate the theoretical expressions, we have produced the package arTeMiDe. arTeMiDe has a flexible module structure and can be used at any level of TMD theory description, from the evaluation of a single TMDPDF or evolution factor to an evaluation of differential cross-section. The arTeMiDe code is available at [31] and can be used to check our statements or test a possible future/alternative ansatz (for instance [14, 32]). In arTeMiDe we have collected all recent achievements of TMD theory, including NNLO matching coefficient function, and N\(^3\)LO TMD anomalous dimensions. In the current version, arTeMiDe evaluates only unpolarized TMDPDFs and related cross-sections, however, we plan to extend it further.

The body of the article is divided as in the following. In Sect. 2 we review the theoretical construction of the Drell-Yan cross section and summarize the theoretical knowledge on unpolarized TMDPDFs. In this section, we also describe all the theoretical improvements which are original for this work. The main original point, namely \(\zeta \)-prescription is presented in Sect. 2.4 and “Appendix B”. The phenomenological studies are presented in Sect. 3. This section includes also a dedicated discussion of the shape of the non-perturbative part of the TMD. The allowed range of validity of the TMD factorization is explored in Sect. 3.4, the presentation of theoretical uncertainties is given in Sect. 3.5. The results of the final fit are presented in Sect. 3.7. A final discussion and conclusions can be found in Sect. 4.

2 Theoretical framework

We consider the Drell–Yan reaction \(h_1+h_2\rightarrow G(\rightarrow ll')+X\), where G is the electroweak neutral gauge boson, \(\gamma ^*\) or Z. The incoming hadrons have momenta \(p_1\) and \(p_2\) with \((p_1+p_2)^2=s\). The gauge boson decays to the lepton pair with momenta \(k_1\) and \(k_2\). The momentum of the gauge boson or equivalently the invariant mass of lepton pair is \(Q^2=q^2=(k_1+k_2)^2\). The differential cross-section for the Drell–Yan process can be written in the form [33, 34]

$$\begin{aligned} d\sigma =\frac{d^4q}{2s}\sum _{G,G'=\gamma ,Z}L_{GG'}^{\mu \nu }W_{\mu \nu }^{GG'}\varDelta _G(q)\varDelta _{G'}(q), \end{aligned}$$
(1)

where 1 / 2s is the flux factor, \(\varDelta _G\) is the (Feynman) propagator for the gauge boson G. The hadron and lepton tensors are respectively

$$\begin{aligned} W^{GG'}_{\mu \nu }= & {} \int \frac{d^4 z}{(2\pi )^4} e^{-iqz} \nonumber \\&\times \, \langle h_1(p_1)h_2(p_2)|J^G_\mu (z)J^{G'}_\nu (0)|h_1(p_1)h_2(p_2)\rangle , \nonumber \\ \end{aligned}$$
(2)
$$\begin{aligned} L_{\mu \nu }^{GG'}= & {} \int \frac{d^3k_1}{(2\pi )^3 2E_1}\frac{d^3k_2}{(2\pi )^3 2E_2} (2\pi )^4 \delta ^4(k_1+k_2-q) \nonumber \\&\times \, \langle l_1(k_1) l_2(k_2)|J_{\nu }^G(0)|0\rangle \langle 0|J_{\mu }^{G'}(0)|l_1(k_1) l_2(k_2)\rangle ,\nonumber \\ \end{aligned}$$
(3)

where \(J_\mu ^G\) is the electroweak current.

The point of our interest is the \(q_T\) dependence of the cross-section, where \(q_T\) is the transverse component of the produced gauge boson in the center-of-mass frame. More precisely, we are interested in the regime \(q_T\ll Q\), where the TMD factorization formalism can be applied. Within the TMD factorization, one obtains the following expression for the unpolarized hadron tensor (see e.g. [35])

$$\begin{aligned} W_{\mu \nu }^{GG'}= & {} \frac{-g_{T\mu \nu }}{\pi N_c}|C_V(q,\mu )|^2\sum _{f,f'}z_{ff'}^{GG'} \int \frac{d^2\varvec{b}}{4\pi }e^{i(\varvec{q} \varvec{b})} \nonumber \\&\times \, F_{f\leftarrow h_1}(x_1,\varvec{b};\mu ,\zeta _1)F_{f'\leftarrow h_2}(x_2,\varvec{b};\mu ,\zeta _2)+Y_{\mu \nu },\nonumber \\ \end{aligned}$$
(4)

where \(g_T\) is the transverse part of the metric tensor and the summation runs over the active quark flavors. The variable \(\mu \) is the hard factorization scale. The variables \(\zeta _{1,2}\) are the scales of soft-gluons factorization, and they fulfill the relation \(\zeta _1\zeta _2\simeq Q^4\). In the following, we consider the symmetric point \(\zeta _1=\zeta _2=\zeta =Q^2\). The variables \(x_{1,2}\) are the longitudinal parts of parton momenta

$$\begin{aligned} x_1= & {} \frac{\sqrt{Q^2+q_T^2}}{\sqrt{s}}e^y\simeq \frac{Q}{\sqrt{s}}e^y, \nonumber \\ x_2= & {} \frac{\sqrt{Q^2+q_T^2}}{\sqrt{s}}e^{-y}\simeq \frac{Q}{\sqrt{s}}e^{-y}. \end{aligned}$$
(5)

The factors \(z_{ff'}^{GG'}\) are the electro-weak charges and they are given explicitly in Sect. 2.1. The factor \(C_V\) is the matching coefficient of the QCD neutral current to the same current expressed in terms of collinear quark fields. The explicit expressions for \(C_V\) can be found in [36,37,38], and are also given in “Appendix A”. The functions \(F_{f\leftarrow h}\) are the unpolarized TMDPDFs for quark f in the hadron h. They are universal non-perturbative functions and the main objects of our study. The details of their definition and their parametrization are given in Sect. 2.3. Finally, the term Y denotes the power corrections to the TMD factorization theorem (to be distinguished from the power corrections to the TMD operator product expansion). The Y-term is of the order \(q_T/Q\) and composed of TMD distributions of the higher dynamical twist. In our study, we restrict ourself to the limit of low \(q_T\) such that the Y-term can be dropped.

Evaluating the lepton tensor, and combining together all factors one obtains the cross-section for the unpolarized Drell–Yan process at leading order of the TMD factorization, in the form [1, 2, 6, 39,40,41]

$$\begin{aligned} \frac{d\sigma }{dQ^2dyd(q_T^2)}= & {} \frac{4\pi }{3N_c}\frac{\mathcal {P}}{sQ^2} \sum _{GG'}z_{ll'}^{GG'}(q)\sum _{ff'}z_{ff'}^{GG'} |C_V(q,\mu )|^2 \nonumber \\&\times \int \frac{d^2\varvec{b}}{4\pi }e^{i(\varvec{b}\varvec{q})} F_{f\leftarrow h_1}(x_1,\varvec{b};\mu ,\zeta ) \nonumber \\&\times F_{f'\leftarrow h_2}(x_2,\varvec{b};\mu ,\zeta )+Y, \end{aligned}$$
(6)

where y is the rapidity of the produced gauge boson. The factor \({\mathcal {P}}\) is a part of the lepton tensor and contains information on the fiducial cuts. It is discussed in details in Sect. 2.6. In the rest of this section a more detailed description of the particular components is presented.

2.1 Expressions for cross-section for different produced bosons

In the case of neutral vector bosons production, the sum over G and \(G'\) in Eq. (6) has three terms

$$\begin{aligned} \frac{d\sigma }{dQ^2dyd(q_T^2)}= & {} \frac{d\sigma ^{\gamma \gamma }}{dQ^2dyd(q_T^2)}+\frac{d\sigma ^{ZZ}}{dQ^2dyd(q_T^2)} \nonumber \\&+ \frac{d\sigma ^{\gamma Z}}{dQ^2dyd(q_T^2)}, \end{aligned}$$
(7)

which correspond to \(\gamma ^*\)-production, Z-production and interference of \(\gamma ^*\)-Z production amplitudes. These three terms of the cross-sections differ from each other only due to the factors \(z_{ff'}^{GG'}\) in Eq. (6), which are

$$\begin{aligned}&z^{\gamma \gamma }_{ll'}z^{\gamma \gamma }_{ff'}=\delta _{{\bar{f}}f'}\alpha ^2_{\text {em}}(Q)e_f^2, \nonumber \\&z^{ZZ}_{ll'}z^{ZZ}_{ff'}=\frac{\delta _{{\bar{f}}f'}\alpha ^2_{\text {em}}(Q)Q^4}{(Q^2-M_Z^2)^2+\varGamma _Z^2M_Z^2}\frac{1-4s_W^2+8s_W^4}{8s_W^2c_W^2} \nonumber \\&\qquad \qquad \qquad \times \frac{1-4|e_f|s_W^2+8e_f^2s_W^4}{8s_W^2c_W^2} \nonumber \\&z^{Z\gamma }_{ll'}z^{Z\gamma }_{ff'}+z^{\gamma Z}_{ll'}z^{\gamma Z}_{ff'}=\frac{\delta _{{\bar{f}}f'}\alpha ^2_{\text {em}}(Q)2Q^2(Q^2-M_Z^2)}{(Q^2-M_Z^2)^2+\varGamma _Z^2M_Z^2}\frac{1-4s_W^2}{4s_Wc_W} \nonumber \\&\qquad \qquad \qquad \times \frac{|e_f|(1-4|e_f|s_W^2)}{4s_Wc_W}, \end{aligned}$$
(8)

where \(M_Z\) and \(\varGamma _Z\) are the mass and the width of the Z-boson, \(s_W\) and \(c_W\) are sine and cosine of the Weinberg angle. We use the following of values [42]

$$\begin{aligned} M_Z=91.2~\text {GeV},\quad \varGamma _Z=2.5~\text {GeV},\quad s^2_W=0.2313. \end{aligned}$$
(9)

In many studies (see e.g.[15, 19, 20, 22, 43]) the contribution of \(\gamma ^*\) to the cross-section is neglected in the vicinity of the Z-peak, i.e. the zero-width approximation is used. Here, instead, we include the \(\gamma ^*\) and interference terms in the evaluation of the the cross-section. The inclusion of these terms is important for LHC (in particular ATLAS experiment), where the measurement precision often exceeds the theory precision.

2.2 TMD parton distributions: evolution

The quark unpolarized TMDPDFs are given by the matrix element [1, 2, 11]

$$\begin{aligned}&F_{q\leftarrow h}(x,\varvec{b};\zeta ,\mu )\nonumber \\&\quad = \frac{Z_q(\zeta ,\mu ) {\mathcal {R}}_q(\zeta ,\mu )}{2} \sum _X\int \frac{d\xi ^-}{2\pi }e^{-ix p^+\xi ^-}\nonumber \\&\qquad \times \left\langle h| \left\{ T\left[ {\bar{q}}_i \,{\tilde{W}}_n^T\right] _{a}\left( \frac{\xi }{2} \right) |X\right\rangle \gamma ^+_{ij}\left\langle X|{\bar{T}}\left[ {\tilde{W}}_n^{T\dagger }q_j\right] _{a}\left( \frac{-\xi }{2} \right) \right\} | h\right\rangle , \end{aligned}$$
(10)

where n is the light-cone vector along the large component of the hadron momentum, \(\xi =\{0^+,\xi ^-,\varvec{b}\}\), Z and \({\mathcal {R}}\) are the ultraviolet and rapidity divergence renormalization factors. The Wilson lines \(W_n\) pointing along the direction n to the infinity. For the detailed definition of all constituents in this expression we refer to [11].

The peculiar feature of the TMD operator is the presence of two types of divergences and, as a consequence, two renormalization factors Z and \({\mathcal {R}}\). Firstly, we have ultraviolet divergences, which have their collinear counterpart in the coefficient function \(C_V\). These divergences are the result of collinear factorization and give rise to the logarithms of the factorization scale \(\mu \). Secondly, we have rapidity divergences, which arise in the factorization of the soft-gluon exchanges between partons. The singular soft-gluons exchanges can be collected into the soft factor, which in turn, can be written as a product of rapidity renormalization factors \({\mathcal {R}}\), see e.g. [10, 11, 44]. This procedure introduces the rapidity factorization scale \(\zeta \).

The dependence of TMDPDF on the factorization scales \(\mu \) and \(\zeta \) is given by the pair of evolution equations

$$\begin{aligned}&\mu ^2 \frac{d}{d\mu ^2} F_{f\leftarrow h}(x,\varvec{b};\mu ,\zeta )=\frac{1}{2}\gamma ^f_F(\mu ,\zeta )F_{f\leftarrow h}(x,\varvec{b};\mu ,\zeta ), \end{aligned}$$
(11)
$$\begin{aligned}&\zeta \frac{d}{d\zeta } F_{f\leftarrow h}(x,\varvec{b};\mu ,\zeta )=-{\mathcal {D}}^f(\mu ,\varvec{b})F_{f\leftarrow h}(x,\varvec{b};\mu ,\zeta ). \end{aligned}$$
(12)

The TMD anomalous dimensions \(\gamma \) and \({\mathcal {D}}\) are known up to order \(a_s^3\) (see [45] for \(\gamma _V\), and [44, 46, 47] for \({\mathcal {D}}\)). They satisfy the consistency condition (Cauchy–Riemann condition), which guaranties the existence of the common solution for equations (11) and (12),

$$\begin{aligned} \zeta \frac{d}{d\zeta }\frac{\gamma ^f_F(\mu ,\zeta )}{2}=\mu ^2\frac{d}{d\mu ^2}(-{\mathcal {D}}^f(\mu ,\varvec{b}))=-\frac{\varGamma ^f(\mu )}{2}, \end{aligned}$$
(13)

where \(\varGamma ^f\) is the cusp anomalous dimension. This equation fixes the logarithmic part of the anomalous dimensions. So, the anomalous dimension \(\gamma \) is linear in logarithm at all orders, while the rapidity anomalous dimension \({\mathcal {D}}\) has all powers of logarithms,

$$\begin{aligned} \gamma _F^f=\varGamma ^f {\mathbf {l}}_\zeta -\gamma _V^f,\quad {\mathcal {D}}^f=\sum _{n=1}^\infty a_s^n\sum _{k=0}^n {\mathbf {L}}_\mu ^k d_f^{(n,k)}. \end{aligned}$$
(14)

Here and in the following, we use the following notation for logarithms

$$\begin{aligned} {\mathbf {L}}_X=\ln \left( \frac{\varvec{b}^2 X}{4 e^{-2\gamma _E}}\right) ,\quad {\mathbf {l}}_X=\ln \frac{\mu ^2}{X}. \end{aligned}$$
(15)

The explicit expressions for the anomalous dimensions up to third-loop order can be found e.g. in the “Appendix” of [11, 44].

The initial values of the factorization scales are dictated by the kinematics of the considered process. In particular, the scales \(\zeta _{1,2}\) are related to the momentum of hard partons as

$$\begin{aligned} \zeta _1\zeta _2=(2p_1^+p_2^-)^2=(Q^2+q_T^2)^2\simeq Q^4. \end{aligned}$$
(16)

In the following, we use the symmetric normalization point, \(\zeta _1=\zeta _2=\zeta =Q^2\). The \(\mu \)-dependence cancels between the parts of factorization formula, namely between hard coefficient function \(|C_V|^2\) and the TMDPDFs. The natural choice of \(\mu \) is such that logarithms appearing in \(|C_V|^2\) are minimized, i.e. \(\mu =Q\). Therefore, TMDPDFs enter in the cross-section in Eq. (6) at the hard point \((\mu _f,\zeta _f)=(Q,Q^2)\).

Fig. 1
figure 1

(left) The evolution plane \((\mu ,\zeta )\) and paths for the evolution integrals from \((\mu _f,\zeta _f)\) to \((\mu _i,\zeta _i)\). Gray lines are equi-evolution lines \(\zeta _\mu \) at different b. Paths 1 and 2 reprent the solutions in Eqs. (19) and (20), corespondingly. These solutions are equivalent to the evolution to the point \((\mu _f,\zeta _{\mu _f})\), which is shown by path 3, because there is no evolution along the blue segment (at \(b=0.7\) \(\hbox {GeV}^{-1}\)). (right) The plot of \(\zeta _\mu \) at \(b=1\) \(\hbox {GeV}^{-1}\) for different orders

A typical construction of a model for a TMD distribution requires its evolution to a different set of scales. The evolution from \((\mu _f,\zeta _f)\) to \((\mu _i,\zeta _i)\) takes the form

$$\begin{aligned} F_{f\leftarrow h}(x,\varvec{b};\mu _f,\zeta _f)= & {} R^f[\varvec{b};(\mu _f,\zeta _f)\rightarrow (\mu _i,\zeta _i)] \nonumber \\&\times F_{f\leftarrow h}(x,\varvec{b};\mu _i,\zeta _i), \end{aligned}$$
(17)

where

$$\begin{aligned}&R^f[\varvec{b};(\mu _f,\zeta _f)\rightarrow (\mu _i,\zeta _i)] \nonumber \\&\quad =\exp \left[ \int _P \left( \gamma ^f_F(\mu ,\zeta )\frac{d\mu }{\mu } -{\mathcal {D}}^f(\mu ,\varvec{b})\frac{d\zeta }{\zeta }\right) \right] . \end{aligned}$$
(18)

Here, the \(\int _P\) denotes the integration along the path P in the \((\mu ,\zeta )\)-plane from the point \((\mu _f,\zeta _f)\) to the point \((\mu _i,\zeta _i)\). The integration can be done on an arbitrary path P, and the solution is independent of it, thanks to the Cauchy–Riemann condition Eq. (13). At a finite perturbative order, the condition Eq. (13) is violated by the next perturbative order. As a consequence the expression for the evolution factor R is dependent on the path of integration. The two simplest choices of integration paths are the combinations of straight segments as

$$\begin{aligned}&\text {path 1}: (\mu _f,\zeta _f)\rightarrow (\mu _i,\zeta _f)\rightarrow (\mu _i,\zeta _i),\\&\text {path 2}: (\mu _f,\zeta _f)\rightarrow (\mu _f,\zeta _i)\rightarrow (\mu _i,\zeta _i). \end{aligned}$$

These paths are depicted in Fig. 1. The factor R evaluated along these paths reads

$$\begin{aligned}&R^f[\varvec{b};(\mu _f,\zeta _f)\xrightarrow {1} (\mu _i,\zeta _i)]\nonumber \\&\quad =\exp \left[ \int _{\mu _i}^{\mu _f}\frac{d\mu }{\mu }\gamma ^f_F(\mu ,\zeta _f)-{\mathcal {D}}^f(\mu _i,\varvec{b})\ln \left( \frac{\zeta _f}{\zeta _i}\right) \right] , \nonumber \\ \end{aligned}$$
(19)
$$\begin{aligned}&R^f[\varvec{b};(\mu _f,\zeta _f)\xrightarrow {2} (\mu _i,\zeta _i)]\nonumber \\&\quad =\exp \left[ \int _{\mu _i}^{\mu _f}\frac{d\mu }{\mu }\gamma ^f_F(\mu ,\zeta _i)-{\mathcal {D}}^f(\mu _f,\varvec{b})\ln \left( \frac{\zeta _f}{\zeta _i}\right) \right] .\nonumber \\ \end{aligned}$$
(20)

The numerical difference between these two expressions represents the value of the uncertainty at a given perturbative order.

The expressions for the evolution factor R given in Eqs. (19) and (20), contain the rapidity anomalous dimension \({\mathcal {D}}(\mu ,\varvec{b})\). The latter contains potentially large values of \({\mathbf {L}}_\mu \), which should be resummed with the help of Eq. (13). Additionally, the rapidity anomalous dimension can acquire power corrections from the higher orders in the power expansion of the factorization theorem [48]. These power corrections can be also observed in the renormalon structure described in [12]. The non-perturbative correction takes the form of a series of even powers of the transverse distance. Therefore, the practical expression for the rapidity anomalous \({\mathcal {D}}\) is

$$\begin{aligned} {\mathcal {D}}^f(\mu ,\varvec{b})=\int _{\mu _0}^\mu \frac{d\mu '}{\mu '}\varGamma ^f+{\mathcal {D}}_{\text {pert}}^f(\mu _0,\varvec{b})+g_K \varvec{b}^2, \end{aligned}$$
(21)

where \(g_K\) is an unknown parameter. Here, \({\mathcal {D}}_{\text {pert}}^f\) is the perturbative expression for \({\mathcal {D}}\). Correspondingly, the value \(\mu _0\) should be chosen such that \({\mathbf {L}}_{\mu _0}\) is minimal in the perturbative region. Substituting this expression to the evolution factor, we obtain

$$\begin{aligned}&R^f[\varvec{b};(\mu _f,\zeta _f)\rightarrow (\mu _i,\zeta _i);\mu _0]\nonumber \\&\quad = \exp \left[ \int _{\mu _i}^{\mu _f}\frac{d\mu }{\mu }\gamma ^f_F(\mu ,\zeta _f)-\int _{\mu _0}^{\mu _i}\frac{d\mu }{\mu }\varGamma ^f(\mu )\ln \left( \frac{\zeta _f}{\zeta _i}\right) \right] \nonumber \\&\qquad \times \left( \frac{\zeta _f}{\zeta _i}\right) ^{-{\mathcal {D}}^f_{\text {perp}}(\mu _0,\varvec{b})-g_K \varvec{b}^2}. \end{aligned}$$
(22)

In this form, the evolution factor R does not depend on the path of evolution, as can be checked explicitly. The perturbative uncertainty which previously has been given by the variation of evolution path, now is represented by the dependence on the parameter \(\mu _0\). Thus, using Eq. (22) the uncertainties of the perturbative calculation can be measured by varying the scale \(\mu _0\). In the following, we use the evolution factor as in Eq. (22).

Fig. 2
figure 2

Schematic picture of the regions in b-space of the TMDPDF and the corresponding/needed theoretical treatment

2.3 TMD parton distributions: b-space behavior

The TMDPDF is a genuine non-perturbative function, which is to be fitted by a certain ansatz, which covers the whole domain in b-space. Different intervals of b-space describe different regimes of strong interactions. In Fig. 2 we show schematically the parts of b-space which need a specific treatment for each TMDPDF. In order to construct an optimal and physically meaningful fitting ansatz, the behavior in every part of the b-space should be reproduced. In this section, we collect the main information on the b-dependence of TMDPDFs, as it is understood according to the current state of art.

The starting point of our description of a TMD distribution is the small-b operator product expansion (OPE), which results in the series

$$\begin{aligned}&F_{q\leftarrow h}(x,\varvec{b};\mu ,\zeta )\nonumber \\&\quad =\sum _{n=0}^\infty \left( \frac{\varvec{b}^2}{B^2}\right) ^n\sum _f \left( C^{(n)}_{q\leftarrow f}(\varvec{b};\mu ,\zeta )\otimes f^{(n)}_{f\leftarrow h}(\mu )\right) (x),\nonumber \\ \end{aligned}$$
(23)

where \(f^{(n)}\) are PDFs of a \(2(n+1)\)-twist, \(C^{(n)}\) are coefficient functions of OPE and the symbol \(\otimes \) represents the convolution in momentum fractions of partons. The parameter B is an unknown non-perturbative parameter which represents an intrinsic hadron scale.

Region 1  In the range \(b\ll B\), the TMDPDF is dominated by the \(n=0\) term of OPE, Eq. (23). The leading term is represented by the usual matching onto twist-2 PDFs and reads

$$\begin{aligned} b&\ll B: \; F_{q\leftarrow h}(x,\varvec{b};\mu ,\zeta ) \nonumber \\&= \sum _f \int _x^1\frac{dz}{z} C_{q\leftarrow f}(z,{\mathbf {L}}_\mu ;\mu ,\zeta ) f_{f\leftarrow h}\left( \frac{x}{z},\mu \right) , \end{aligned}$$
(24)

where C is known up to two-loop order [11, 49].

There is a subregion \(b\ll 1/Q\), which should be considered specially. While the TMD distribution is completely perturbative within this region, the contributions of this region to the cross-section strongly overlaps with the Y-term, Eq. (4), which is formally \({\mathcal {O}}(1/(bQ))\). The behavior of TMD distributions within this tiny range together with the Y-term dictates the asymptotics of the cross-section at large \(q_T\). As a consequence, it has a significant influence on the value of the total cross-section. In our current evaluation we restrict ourself to the range of small-\(q_T\) (for a dedicated study of the applicability of this approximation in practice, see Sect. 3.4). Therefore, we drop the Y-term and do not need any special treatment of \(b\ll Q^{-1}\) region.

Region 2  In the range \(b\lesssim B\) the OPE is still valid. However, one has to include the higher order terms in addition to the leading one. Very little is known about power suppressed terms of the small-b OPE. Our recent study of the renormalon singularities [12] suggests several hints that can be used to model this region:

  1. (i)

    The OPE contains only even powers of b. Moreover, the coefficient function of n’th order has a prefactor \(x^n\). In other word, the natural scale of OPE is \(x\varvec{b}^2/B^2\) rather then just \(\varvec{b}^2/B^2\).

  2. (ii)

    The higher order OPE contributions induced by renormalons, can be summed together to some effective non-perturbative function under the convolution integral.

Therefore, in this region the TMDPDF can be approximated by the form

$$\begin{aligned} b\sim & {} B :F_{q\leftarrow h}(x,\varvec{b};\mu ,\zeta ) \nonumber \\= & {} \sum _f \int _x^1\frac{dz}{z} G_{q\leftarrow f}\left( z,\frac{z\varvec{b}^2}{B^2},{\mathbf {L}}_\mu ;\mu ,\zeta \right) \nonumber \\&\times f_{f\leftarrow h}\left( \frac{x}{z},\mu \right) , \end{aligned}$$
(25)

where the leading term of the power series in b / B of G is given by C. As the power n grows, the sub-leading terms of OPE switch on, which is schematically presented by gray lines in Fig. 2. The particular contributions at higher n are not so important in the continuous TMD picture. However,

  1. (iii)

    The \(n=1\) contribution to OPE can be estimated by the leading renormalon contribution of order \(\sim x\varvec{b}^2\) [12]. It has the form

    $$\begin{aligned} C^{\text {ren}}_{q\leftarrow q}(x,\varvec{b};\mu ,\zeta )= & {} 2{\bar{x}}+\frac{2x}{(1-x)_+} \nonumber \\&-\,\delta ({\bar{x}})\left( {\mathbf {L}}_\varLambda -{\mathbf {L}}_{\sqrt{\zeta }}+\frac{2}{3}\right) , \end{aligned}$$
    (26)

    where \(\varLambda =\varLambda _{QCD}\) is the position of the Landau pole.

Region 3  At \(b\gg B\) the small-b OPE cannot be considered as a source of information, and the TMD is completely non-perturbative. Luckily, this region is suppressed by the evolution factor. As a consequence, the cross-section is not very sensitive to the fine structure of TMD distribution in this region, but the general behavior is important. We have tested several asymptotical forms of the TMDPDF, including Gaussian, exponential and power-like and found that the best agreement with the experimental data is achieved with exponential behavior. This observation is in agreement with the general physical intuition, that at high distances the strong forces are dominated by meson exchange, while the Gaussian and power-like asymptotics can not be produced in any simple way.

We should mention that the size of the parameter B, as well as, the order of convergence of the small-b OPE, which influences the size of the intermediate region 2, are not known. Our estimations of these characteristic sizes are presented in Sect. 4.

2.4 Definition of scaling parameters

The small-b matching is the starting point for the construction of the majority of phenomenological ansatzes for TMD distributions. It can be considered as an additional collinear factorization, which is performed at some convenient set of scales \((\mu _i,\zeta _i)\). The difference of \((\mu _i,\zeta _i)\) from the initial (defined by process kinematic) scales of TMD distribution is compensated by the evolution factor in Eq. (17). As usual, the all-order expression is independent of \((\mu _i,\zeta _i)\), but in practice, these scales are to be chosen such that the coefficient function \(C_{f\leftarrow f'}\) has good perturbative convergence. This procedure is alike the choice of hard-factorization scale, with one essential difference: the parameter b, which accompanies \(\mu _i\) and \(\zeta _i\) in the logarithms, has no fixed value. It varies from zero to infinity within the Fourier integral.

The choice of scales \((\mu _i,\zeta _i)\) is one of the central point of the TMD phenomenology. To make the discussion clearer, let us recall the expression for the coefficient function at NLO. It reads

$$\begin{aligned}&C_{q\leftarrow q}(x,{\mathbf {L}}_\mu ;\mu ,\zeta )\nonumber \\&\quad =\delta ({\bar{x}})+ a_s(\mu )C_F\left[ -2{\mathbf {L}}_\mu \left( \frac{2}{(1-x)_+}-1-x\right) \right. \nonumber \\&\qquad +\left. 2{\bar{x}}+ \delta ({\bar{x}})\left( -{\mathbf {L}}_\mu ^2+ 2{\mathbf {L}}_\mu {\mathbf {l}}_\zeta -\frac{\pi ^2}{6}\right) \right] , \end{aligned}$$
(27)

where the notation for the logarithms is defined in Eq. (15). Ideally, the scales \(\mu \) and \(\zeta \) should be chosen such that no large perturbative contributions appear in the coefficient function. Clearly, it cannot be done at arbitrary b due to the presence of \(\mu \) in the coupling constant and in \({\mathbf {L}}_\mu \). However, such a strict choice is not required. The only requirement for scales is to keep the perturbative ansatz stable, i.e. to prevent its blowing up. There are several solutions of this problem. The most famous is the \(b^*\)-prescription [27]. Within the \(b^*\)-prescription the logarithms \({\mathbf {L}}_\mu \) are absent, and this fact allows the control of the perturbative series in the full region of b. However, the \(b^*\)-prescription introduces artificial power corrections to the small-b OPE, which washes out any theoretical intuition. Another popular scheme [50, 51] is based on the re-expression of Hankel-integral as an integral in the complex b-plane. In this way, the logarithms \({\mathbf {L}}_\mu \) can be minimized by \(\mu \sim b^{-1}\) and the Landau pole at large-b is by-passed in the complex plane. The drawback of this scheme is the necessity of the analytical continuation into the complex b-plane, and the restriction to NNLO (since the analytical solution for running coupling at \(\hbox {N}^3\hbox {LO}\) is unknown).

In this work we use another scheme which we call \(\zeta \)-prescription. It is a novel one (to our best knowledge), and it is described in the following.

The \(\zeta \)-prescription uses the fact that the TMD operator and hence its small-b OPE depends on two scales \(\mu \) and \(\zeta \), which are entirely independent. This simple fact has been overlooked so far. Indeed, the first typical step is to fix \(\zeta =C_0^2/\varvec{b}^2\), or \(\zeta =\mu ^2\) [1, 12, 52]. It reduces the problem to a single variable problem, which looks simpler, but finally, it does not provide a simple solution for the appearance of large logarithms in the OPE.

The initial point of the \(\zeta \)-prescription is the observation that not all logarithms in the coefficient function are dangerous. So, the terms \({\mathbf {L}}^2_\mu \) and \({\mathbf {L}}_\mu {\mathbf {l}}_\zeta \) in Eq. (27) are problematic, while the logarithm in the first term is not. There are several reasons for it. First, the double logarithm contributions violate the normal perturbative counting and at large-b grows faster than the single logarithms. Second, the first term of Eq. (27) comes together with the DGLAP kernel, and thus, it preserves the area (say, the integral over x) of the TMDPDF, due to the conservation of the electromagnetic charge. We remind that logarithms accompanying the DGLAP kernel are related to PDF evolution, while the rest of logarithms are related to the TMD evolution. For this reason, the main problem of convergence is represented by the logarithms that are related to the TMD evolution. The logarithms related to the PDF evolution come with a particular x-dependent function. The probabilistic interpretation of PDF ensures their minimal contribution in the very large domain of b. Practically, this fact has been already demonstrated although not entirely realized in the fit [20]. In the realization of Ref. [20], the DGLAP logarithms were left unregulated and they do not influence the convergence of the fit.

The logarithms related to the TMD evolution can be eliminated completely by a particular choice of \(\zeta =\zeta _\mu \). Along the curve \(\zeta _\mu \), the TMD distributions are independent of \(\mu \). In other words, the curve \(\zeta _\mu \) is an equi-evolution curve in the plane \((\mu ,\zeta )\). Such a curve satisfies the equation

$$\begin{aligned} \mu ^2 \frac{d F(x,\varvec{b};\mu ,\zeta _\mu )}{d\mu ^2}=0. \end{aligned}$$
(28)

Using the definition of anomalous dimensions in Eq. (11) we rewrite this equation as

$$\begin{aligned} {\mathcal {D}}({\mathbf {L}}_\mu )f'({\mathbf {L}}_\mu )+\frac{\varGamma }{2}f({\mathbf {L}}_\mu )-{\mathcal {D}}({\mathbf {L}}_\mu )-\frac{\gamma _V}{2}=0, \end{aligned}$$
(29)

where \(f({\mathbf {L}}_\mu )={\mathbf {l}}_{\zeta _\mu }\). The perturbative solution is discussed and presented in the “Appendix B.1”. The curve \(\zeta _\mu \) is different for quark and for gluon TMDs, and it is expressed in terms of the TMD anomalous dimensions Eq. (62). In our analysis, we need only the quark case. Up to NNLO it reads

$$\begin{aligned} {\mathbf {l}}_{\zeta _\mu }= & {} \frac{{\mathbf {L}}_\mu }{2}-\frac{3}{2}+a_s\left[ \frac{11 C_A-4 T_F N_f}{36}{\mathbf {L}}_\mu ^2\right. \nonumber \\&\left. +\, C_F\left( -\frac{3}{4}+\pi ^2-12\zeta _3\right) \right. \nonumber \\&\left. +\,C_A\left( \frac{649}{108}-\frac{17\pi ^2}{12}+\frac{19}{2}\zeta _3\right) \right. \nonumber \\&\left. +\, T_FN_f\left( -\frac{53}{27}+\frac{\pi ^2}{3}\right) \right] +{\mathcal {O}}(a_s^2). \end{aligned}$$
(30)

Note, that in Eq. (30) we have set the boundary condition such that no terms singular at \({\mathbf {L}}_\mu \rightarrow 0\) appear in \({\mathbf {l}}_\zeta \) (see “Appendix B.1”, for details). Also, in the current work we drop the power contributions to the rapidity anomalous dimension \({\mathcal {D}}\). The influence of these decisions should be investigated later. One can check that the leading term of \(\zeta _\mu \) (i.e. \({\mathbf {l}}_\zeta ={\mathbf {L}}_\mu /2\)) cancels leading powers of logarithms at all orders in perturbation theory (i.e. all terms \(a_s^n {\mathbf {L}}_\mu ^{2n}\)). Then, including the next correction (\(a_s\beta _0{\mathbf {L}}_\mu ^2/12 \)) cancels subleading powers of logarithms at all orders of the perturbation theory (i.e. all terms \(a_s^n {\mathbf {L}}_\mu ^{2n-1}\)) , and so on.

Substituting the leading term of the solution in Eq. (30) to the quark small-b coefficient function, we obtain

$$\begin{aligned}&C_{q\leftarrow q}(x,{\mathbf {L}}_\mu ;\mu ,\zeta _\mu )\nonumber \\&\quad =\delta ({\bar{x}}) \nonumber \\&\qquad +\, a_s(\mu )C_F\left[ -2{\mathbf {L}}_\mu \left( \frac{2}{(1-x)_+}-1-x\right) \right. \nonumber \\&\qquad +\,\left. 2{\bar{x}}+ \delta ({\bar{x}})\left( -3{\mathbf {L}}_\mu -\frac{\pi ^2}{6}\right) \right] . \end{aligned}$$
(31)

This coefficient function is stable for any value of \({\mathbf {L}}_\mu \), which can be seen by considering its integral

$$\begin{aligned} \int _0^1 dx C_{q\leftarrow q}(x,{\mathbf {L}}_\mu ;\mu ,\zeta _\mu )=1+a_s(\mu )C_F\left( 1-\frac{\pi ^2}{6}\right) ,\nonumber \\ \end{aligned}$$
(32)

which is independent of \({\mathbf {L}}_\mu \).

The general expression for the coefficient of arbitrary flavour at NNLO has the form

$$\begin{aligned}&C_{f\leftarrow f'}(x,\varvec{b};\mu ,\zeta _\mu )\nonumber \\&\quad =\delta _{ff'}\delta ({\bar{x}}) \nonumber \\&\qquad +\,a_s\left( -{\mathbf {L}}_\mu P^{(1)}_{f\leftarrow f'}+C^{(1,0)}_{f\leftarrow f'}\right) \nonumber \\&\qquad +\,a_s^2 \left[ {\mathbf {L}}_\mu ^2\frac{P_{f\leftarrow k}^{(1)}\otimes P_{k\leftarrow f'}^{(1)}-\beta _0 P^{(1)}_{f\leftarrow f'}}{2}\right. \nonumber \\&\qquad \left. -\,{\mathbf {L}}_\mu \left( P^{(2)}_{f\leftarrow f'}+C^{(1,0)}_{f\leftarrow k}\otimes P^{(1)}_{k\leftarrow f'}-\beta _0 C_{f\leftarrow f'}^{(1,0)}\right) \right. \nonumber \\&\qquad \left. +\, \frac{d^{(2,0)}_f\gamma _V^{f(1)}}{\varGamma _0^f}\delta ({\bar{x}})+C^{(2,0)}_{f\leftarrow f'}\right] +{\mathcal {O}}(a_s^3), \end{aligned}$$
(33)

where \(C^{(n,0)}\) is the finite part of the coefficient function at n’th perturbative order, and \(P(x)=\sum a_s^n P^{(n)}\) is the DGLAP kernel. The detailed derivation of Eq. (33) is presented in the “Appendix B.2”. Eq. (33) has the form of the usual coefficient function for an object without external evolution (e.g. coefficient function for DIS). In other words, it is straightforward to check that

$$\begin{aligned} \mu ^2\frac{d}{d\mu ^2}C_{f\leftarrow f'}(x,\varvec{b};\mu ,\zeta _\mu )\otimes f_{f'\leftarrow h}(x,\mu )=0, \end{aligned}$$
(34)

by direct differentiation of Eq. (33). The integral of this function over x is independent of \({\mathbf {L}}_\mu \) due to the charge conservation, and thus at least the area of TMDPDF is stable at large b.

A further positive point of the \(\zeta \)-prescription is that the scale \(\mu \) remains unconstrained. Often, the scale \(\mu \) is selected such that it behaves as \(\sim 1/b\) at \(b\rightarrow 0\). Such a choice minimizes the small-b logarithms in small-b OPE and in the evolution exponent. At large-b the scale \(\mu \) should be frozen to some fixed value (of the order of a few GeV’s), in order to avoid the Landau pole. We use the simplest function which satisfies these criteria

$$\begin{aligned} \mu =\mu _b=\frac{C_0}{b}+2\; {\mathrm{GeV}}. \end{aligned}$$
(35)

There are several practical motiviations for the choice of the 2 GeV asymptotic (at \(b\rightarrow \infty \)) scale. To start with, the fixed scale 2 GeV is a standard scale of PDF extractions. The data that we analyze start with a dilepton invariant mass of 4 GeV, so that we want to fix the starting scale below this energy. On the other side we do not want to implement a perturbative expansion below 1 GeV, where the convergence of the theory is not ensured. A discussion about the theoretical error induced by this choice in the interval 1–4 GeV is posponed to Eq. (37).

Finally, we should also select the value for the parameter \(\mu _0\) that enters in the expression for the evolution factor, Eq. (22). To keep our discussion simple, we set \(\mu _0=\mu _b\).

2.5 Theoretical uncertainties and perturbative ordering

Table 1 The perturbative orders studied in the fit. For each order we indicate the power of \(a_s\) of each piece that enters in the TMDs. Note, that the order of \(a_s\) and PDF set are related, since the values of \(a_s\) are taken from the PDF set

In the construction of the cross section, one finds several sources of perturbative uncertainties. The size of these uncertanties can be estimated by the variation of associated scales. We list here the ones that we have considered in the present work.

  • Uncertainty associated with the perturbative matching of rapidity anomalous dimension  This uncertainty arises from the dependence (at the fixed perturbative order) on \(\mu _0\), which should be compensated between the Sudakov factor and the boundary term \({\mathcal {D}}(\mu _0)\) in the TMD evolution factor Eq. (22). This uncertainty can be tested by changing \(\mu _0\rightarrow c_1\mu _0\) and varying \(c_1\in [0.5,2]\).

  • Uncertainty associated with the hard factorization scale This uncertainty arises from the dependence (at the fixed perturbative order) on the scale \(\mu _f(\sim Q)\) which is to be compensated between the hard coefficient function \(|C_V|^2\) and the TMD evolution factor. This uncertainty can be tested by changing \(\mu _f\rightarrow c_2\mu _f\) and varying \(c_2\in [0.5,2]\).

  • Uncertainty associated with the TMD evolution factor   This uncertainty arises from the dependence (at the fixed perturbative order) on the initial scale of TMD evolution \(\mu _i\), which is to be compensated between the evolution integral and the \(\mu \)-dependence of \(\zeta _i\) in Eq. (22). This uncertainty can be tested by changing \(\mu _i\rightarrow c_3\mu _i\) and varying \(c_3\in [0.5,2]\).

  • Uncertainty associated with the small-b matching   This uncertainty arises from the dependence (at the fixed perturbative order) on the scale of the small-b matching \(\mu _{\text {OPE}}\) which is to be compensated between the small-b coefficient function \(C_{f\leftarrow f'}\) and evolution of PDF. This uncertainty can be tested by changing \(\mu _{\text {OPE}}\rightarrow c_4\mu _{\text {OPE}}\) and varying \(c_4\in [0.5,2]\).

We remark that our definition of perturbative uncertainties \(c_{1,2}\) is commonly used in the literature (as far as it can be compared among different schemes of calculation), see e.g. [21, 43]. Usually the uncertainties \(c_{3,4}\) are not distinguished and they are commonly varied simultaneously i.e. in the literature one finds discussions of errors for the case \(c_4=c_3\). To our best knowledge, the distinction of the matching and evolution uncertainties is made here for the first time.

In this way, the general expression for the cross-section in Eq. (6) with our choice of scales reads

$$\begin{aligned}&\frac{d\sigma }{dQ^2dyd(q_T^2)}\nonumber \\&\quad = \frac{4\pi }{3N_c}\frac{{\mathcal {P}}}{sQ^2} \sum _{GG'}z_{ll'}^{GG'}(q)\nonumber \\&\qquad \times \,\sum _{ff'}z_{ff'}^{GG'} \int \frac{d^2\varvec{b}}{4\pi }e^{i(\varvec{b}\varvec{q})}|C_V(Q,c_2Q)|^2\nonumber \\&\qquad \times \, \Big \{R^f[\varvec{b};(c_2Q,Q^2)\rightarrow (c_3\mu _i,\zeta _{c_3\mu _i});c_1\mu _i]\Big \}^2 \nonumber \\&\qquad \times \, F_{f\leftarrow h_1}(x,\varvec{b};c_4\mu _{\text {OPE}},\zeta _{c_4\mu _{\text {OPE}}})F_{f'\leftarrow h_2}\nonumber \\&\qquad \times \,(x,\varvec{b};c_4\mu _{\text {OPE}},\zeta _{c_4\mu _{\text {OPE}}}), \end{aligned}$$
(36)

where the evolution factor R is given in Eq. (22) and the explicit expression for the \(\zeta _\mu \) is given in Eq. (30). The low-normalization point \(\mu _i\) and the scale of small-b operator product expansion \(\mu _{\text {OPE}}\) are fixed at the same point as in Eq. (35)

$$\begin{aligned} \mu _i=\mu _{\text {OPE}}=\frac{C_0}{b}+2\;{\mathrm{GeV}}. \end{aligned}$$
(37)

In the limit \(b\rightarrow \infty \) the scale \(\mu _i\) reaches the fixed point of 2 GeV, cfr. Eq. (35). The error induced by this choice of the asymptotic energy scale is evaluated together with the error induced by the scale \(\mu _i\). In particular, one observes that the variation of \(c_{1,3,4}\) in formula (36) allows to test the impact of the variation of the fixed scale of 2 GeV in the whole range 1–4 GeV, as discussed around Eq. (35).

The perturbative orders of each cross section constituent are to be combined consistently. Having at our disposal the NNLO expressions for coefficient function and \(\hbox {N}^3\hbox {LO}\) expressions for anomalous dimensions, we can define four successive self-contained sets of ordering. This is reported in Table 1. In our definition of orders we use the following logic: (i) The order of the \(a_s\)-running should be the same as the order of PDF set, since their extraction are correlated. (ii) The order of \({\mathcal {D}}\) should be the same as the order of \(\varGamma \) since they enter the evolution kernel R with the same counting of logarithms (i.e. \(a_s^n \ln ^{n+1}\mu \)), and one-order higher then the order of \(\gamma _V\), since it has counting \(a_s^n \ln ^n\mu \). (iii) The order of small-b matching coefficient should be the same as the order of evolution of a PDF, because large logarithms of b are to be compensated by the PDF evolution. (iv) The order of \(\zeta _\mu \) should be such that no logarithms appear in the coefficient function, and the general logarithm counting coincides with the counting of the evolution factor. In Table 1 the order of the \(\zeta _\mu \) is defined as following: NLL is \({\mathbf {l}}_\zeta ={\mathbf {L}}_\mu /2\), NLO has in addition finite part at order \(a_s^0\) (i.e. two first terms of Eq. (30)), NNLL has in addition logarithmic part at order \(a_s^1\) (i.e. the first line of Eq. (30)), and NNLO is given by whole expression Eq. (30). The \({\mathbf {l}}_\zeta \) cases NLL and NNLL are somewhat intermediate cases. In fact, while one achieves a cancellation of logs of the same order in the evolution kernel and the coefficient, one finds that the counting in the coefficient is consistent with the \(a_s L_\mu ^2\sim a_s^0\). A similar counting was introduced in [53]. We postpone a full study of this counting within \(\zeta \)-prescription to a future work.

To label the orders we use the nomenclature where the part with ’LO suffix designates the order of coefficient functions, and the part with ’LL suffix designates the order of the evolution factor in the \(a_s \ln \mu \sim 1\) scheme. So, our highest order is NNLL/NNLO, which at the moment the highest available combination of the perturbative series. The order NLL/LO appears to be barely inconsistent, because it requires the LO PDF evolution to match the trivial coefficient function. Therefore, we exclude the NLL/LO from our phenomenological studies.

2.6 Implementation of lepton cuts

In modern experiments, the cross-section is often evaluated using fiducial cuts on the dilepton momenta. That is, the lepton tensor in Eq. (3) should be evaluated taking into account the experimental cut phase-space. At leading order the lepton tensor takes the form

$$\begin{aligned} (-g_T^{\mu \nu })L_{\mu \nu }^{GG'}= & {} 32z^{GG'}_{ll'}\int \frac{d^3k_1}{2E_1}\frac{d^3k_2}{2E_2}((k_1\cdot k_2)\nonumber \\&+\, (\varvec{k}_1\cdot \varvec{k}_2))\theta (k_{1,2}\in \text {cuts})\delta ^4(k_1+k_2-q),\nonumber \\ \end{aligned}$$
(38)

where \(\theta \)-function restricts the lepton momenta to the allowed region.

Table 2 The characteristics of the data measured at E288 experiment

In the limit \(Q\rightarrow \infty \) and no restriction on the lepton pair phase space we obtain

$$\begin{aligned} \lim _{Q\rightarrow \infty }(-g_T^{\mu \nu })L_{\mu \nu }^{GG'}= & {} \frac{16\pi }{3}z^{GG'}_{ll'}Q^2. \end{aligned}$$
(39)

Substituting this expression to the cross-section we obtain the standard formula to the Drell-Yan cross-section within TMD factorization [1, 2, 6, 39,40,41]. In order to include the corrections due to a finite Q and experimental cuts let us introduce a factor \({\mathcal {P}}\), i.e.

$$\begin{aligned} (-g_T^{\mu \nu })L_{\mu \nu }^{GG'}= & {} \frac{16\pi }{3}z^{GG'}_{ll'}Q^2{\mathcal {P}}, \end{aligned}$$
(40)

which is consistent with the cross section expression presented in Eq. (6). The function \({\mathcal {P}}\) in the absence of cuts reads

$$\begin{aligned} {\mathcal {P}}(\text {no cuts})=1+\frac{q_T^2}{2Q^2}. \end{aligned}$$
(41)

In the presence of cuts the expression for \({\mathcal {P}}\) is involved. For example, at \(q_T=0\) and \(y=0\) it reads

$$\begin{aligned}&{\mathcal {P}}_{q_T=0,y=0}(|k_{1,2}| {>}p_T;|\eta _{1,2}|{<} \eta ) \nonumber \\&\quad = \left\{ \begin{array}{ll} 0, &{}\quad Q<2p_T \\ \left( 1-\frac{p_T^2}{Q^2}\right) \sqrt{1-\frac{4p_T^2}{Q^2}},&{}\quad 2p_T<Q<2p_T \cosh \eta \\ \left( 1-\frac{1}{4\cosh ^2\eta }\right) \, \eta , &{}\quad 2p_T\cosh \eta <Q. \end{array} \right. \nonumber \\ \end{aligned}$$
(42)

Generally, \({\mathcal {P}}\) cannot be evaluated analytically, but it is rather easy to evaluate numerically.

3 Comparison with experiment

3.1 Review of experimental data

In this section we present the experimental data that have been included in our fit. We have splitted the data into two large data sets with respect to a generic energy scaling. They include the measurements from the following experiments:

  • Low-energy data set

    • E288: Drell-Yan process, at \(4<Q<14\) GeV.

  • High energy data set:

    • CDF/D0: Z-boson production at \(\sqrt{s}=1.8,~1.96\) TeV.

    • ATLAS/CMS/LHCb: Z-boson production at \(\sqrt{s}=7,8,13\) TeV.

    • ATLAS: Vector boson production outside the Z-peak (\(46<Q<66\) and \(116<Q<150\) GeV) at \(\sqrt{s}=8\) TeV.

In the present study, we have not included the data of other experiments, such as E605, or R209. In a previous work [20] it was observed that the E605 data suffer from internal inconsistencies and because of their reduced number they do not alter sensibly the results of the fit. The data points from R209 are even less, they have enormous uncertainties (they can be extracted only from a plot) and result to be even less significative. One observes also that the data of LHC below the Z-boson threshold have cinematical features similar to the ones of R209 and have a much bigger precision (see Fig. 13). Because of this reason we exclude these data from the present fit.

In the following, we present each included measurement in more detail.

E288  The E288 experiment [29] presents a large number of low energy points which is nearly equal the total number of points of high energy experiments. For convenience we have splitted this data set into three subsets with different center of mass energy s. The characteristics of the measurements are shown in Table 2. Concerning these data we can comment the following:

  • We exclude the data points in the range \(9<Q<11\) GeV, because these data are dominated by the \(\Upsilon \)-resonance (\(M_\Upsilon \simeq 9.5\) GeV). The description of \(\Upsilon \)-resonance production is beyond the scope of current TMD factorization approach.

  • The E288 experiment is made on a copper target. To simulate the effects of copper nuclei we replace the proton PDFs by the following combinations

    $$\begin{aligned} u_{Cu}(x)= & {} \frac{Z u(x)+N d(x)}{A}, \nonumber \\ d_{Cu}(x)= & {} \frac{Z d(x)+N u(x)}{A}, \nonumber \\ s_{Cu}(x)= & {} s(x), \end{aligned}$$
    (43)

    where \(Z=29\), \(A=63\) and \(N=A-Z=34\), are charge, atomic number and the number of neutrons in copper correspondingly.

  • The absolute normalization of the E288 \(p_T\)-cross-section is unknown. Typically, one includes an additional normalization factor \(N_{E288}\), as a parameter of the fit, see e.g. [13, 15, 19, 20]. There is no agreement on this factor values, it varies from \(\sim 0.8\) [13, 19, 20] to \(\sim 1.2\) [15]. In our analysis we fix \({\mathcal {N}}_{E288}=0.8\).

    The theoretical uncertainties for low energy experiments are large, of the order ±  10% at the best (see Sect. 3.5). As a consequence, the value of the cross-section is very sensitive to the choice of the PDF set and the overall normalization factor. For example, we have checked that the E288 data can be fitted also with \(N_{E288}=0.9\) with the same (or better) value of \(\chi ^2\) by an additional variation of \(\mu _b\). However, we consider this as a bad practice and restrict ourself to \({\mathcal {N}}_{E288}=0.8\), as the most conventional solution.

  • The data are splitted into different bins with different dilepton invariant mass. For each bin we evaluate the cross-section Eq. (6) as

    $$\begin{aligned} E\frac{d \sigma }{dq^3}=\int _{Q_{\text {min}}}^{Q_{\text {max}}}dQ\, \frac{2Q}{\pi } \frac{d\sigma }{dQ^2 dy d(q_T)^2}, \end{aligned}$$
    (44)

    where \(Q_{\max ,\min }\) are the boundary of the Q-bin.

Table 3 The characteristics of the data measured at CDF and D0 collaborations at run 1
Table 4 The characteristics of the data measured at CDF and D0 collaborations at run 2
Table 5 The characteristics of the Z-boson production data measured by ATLAS collaborations

CDF and D0. The data on the Z-boson production measured by CDF and D0 collaborations at Tevatron Run 1 and Run 2 [55,56,57,58,59] have been used nearly in every fit of unpolarized TMDPDFs. They are summarized in Tables 3, 4. Concerning these data we can comment the following:

  • There is a known tension between the values of total cross-section at run I of CDF and D0. Here we restrict ourself to the fit of the shape of the cross-section and normalize the theoretical points on the bin-by-bin integrals in the allowed range of \(q_T\). I.e. we multiply the theoretical cross-section by the factor

    $$\begin{aligned} {\mathcal {N}}=\frac{\sum _{\begin{array}{c} \text {included}\\ {\text {bins}} \end{array}} \varDelta q_T \frac{d\sigma _{\text {exp.}}}{dq_T}}{\sum _{\begin{array}{c} \text {included}\\ {\text {bins}} \end{array}} \varDelta q_T \frac{d\sigma _{\text {th.}}}{dq_T}}, \end{aligned}$$
    (45)

    where \(\varDelta q_T\) is the size of \(q_T\)-bins. As we show in Sect. 3.6 the obtained normalization factors are very close to one (at NNLO), and the values of partial cross-sections are in agreement with the experimental ones within error-bars. In the Tables 3, 4, we also present the values of the total cross-sections evaluated by DYNNLO code [17, 54]. In this calculation of the total-cross-section, we have used the same inputs as in the TMD fits, i.e. the PDF are taken from MMHT2014 set [60].

  • The experimental values for cross-section points are obtained by integrating over all values of y, integrating over measure range of Q and averaging in \(q_T\). Consequently, we have used the following expression for the cross-section

    $$\begin{aligned} \frac{d\sigma }{dq_T}= & {} \frac{1}{\varDelta q_T}\int _{q_{T,\text {low}}}^{q_{T,\text {high}}}2 q'_T dq'_T \int _{-y_0}^{y_0}dy \nonumber \\&\times \int _{M_{ll,\min }}^{M_{ll,\max }} 2 Q dQ~\frac{d\sigma }{dy d({q'_T}^2)dQ^2}, \end{aligned}$$
    (46)

    where \(y_0=\frac{1}{2}\ln (s/Q^2)\), \(q_{T,\text {low}}\) and \(q_{T,\text {high}}\) are boundaries of \(q_T\)-bin, and \(\varDelta q_T\) is the size of the \(q_T\)-bin.

ATLAS  The data by ATLAS collaboration in [30, 62] cover a broad range of dilepton invariant masses for the Drell–Yan process with small experimental error-bands. So, this set provide the biggest constraints on TMD extraction coming from high energy data points. The characteristics of the measurements are resumed in Tables 5, 6 and here we comment the following:

  • The data from the ATLAS detector at 8 TeV run are presented in several sets [30], which corresponds to different treatment of final-state photon radiation. We have considered the “dressed” set of the data.

  • The values of cross-section have been calculated using the expression in Eq. (46), where \(y_0=2.4\), as it is presented in the Tables 5, 6.

  • There is a known tension between the theoretical calculation of the integrated cross-section and the measured one, see e.g. [30, 61]. Moreover the available theoretical cross-section for vector boson production is not precise enough for the present study. Therefore, we normalize the calculated cross-sections by a factor, as explained in more detail in the text around Eq. (51). In Sect. 3.6, we compare the obtained values of normalization to the total cross-section. We have found that the values of obtained normalization are practically independent of the non-perturbative input of the TMD model, and at NNLL/NNLO correctly reproduce (within the error-bars) the measured total cross-section.

  • All data sets from LHC are presented within fiducial cross-sections. Therefore, we have implemented the cut leptonic tensor as it is discussed in Sect. 2.6.

Table 6 The characteristics of the data for the vector boson production off the Z-peak measured by ATLAS collaborations
Table 7 The characteristics of the Z-boson production data measured by CMS collaborations
Table 8 The characteristics of the Z-boson production data measured by LHCb collaborations

CMS and LHCb  The CMS and LHCb collaborations provide data around the Z-boson peak in [63,64,65,66,67], see Tables 78. The treatment of these data is similar to the case of ATLAS data:

  • The values of cross-section have been calculated using the expression in Eq. (46), where the limits for y-integration \(y_0\) are taken in accordance to the Tables 78.

  • Just as in the case of ATLAS data we have normalized the calculated cross-sections by the factor provided in Eq. (45) discussed in Sect. 3.6. We have found a good agreement between the theoretical and experimental values for total cross-section for LHCb data.

  • All data sets from LHC are fiducial cross-sections. Therefore, we have implemented the cut leptonic tensor as it is discussed in Sect. 2.6.

Finally, we have considered only points which allow a consistent TMD treatment. I.e. the points with the value of \(q_T<\delta _T Q\), where \(\delta _T\) is sufficiently small. In the literature we have not found any special study on this limiting ratio. So, we present our study in Sect. 3.4, and conclude that TMD factorization range is \(q_T/Q<0.2\).

3.2 arTeMiDe

In order to evaluate the cross-sections we have prepared the program package arTeMiDe. The arTeMiDe package is a collection of FORTRAN modules that evaluates individual terms of the TMD factorization formalism, such as TMD evolution factors, TMDPDFs, and combines them into the differential cross-sections. arTeMiDe forms a flexible package for TMDPDF phenomenology based on the \(\zeta \)-prescription, as described in this article. It is publicly available at the web-page [31].

arTeMiDe version 1.1 evaluates the quark and gluon unpolarized TMDPDFs (although in the discussed fit the gluon TMDPDFs are not necessary) for any given function \(f_{NP}\), at any composition of perturbative orders from LO to NNLO, with or without renormalon-induced power corrections. For the current study, the input PDFs are taken from the MMHT2014 PDF set [60].

The most time-consuming part of the numerical evaluation of the TMDPDFs, is the convolution integral in Eq. (48), which is especially expensive at NNLL/NNLO. Within the arTeMiDe package the convolution integral is optimized using an approximate expression for NNLO coefficient functions. The approximate form of the NNLO coefficient function is (note, that NLO and renormalon coefficient functions can be presented in this form without approximation)

$$\begin{aligned} C({\mathbf {L}}_{\mu },x)= & {} A_1 \delta ({\bar{x}})+A_2\left( \frac{1}{1-x}\right) _++A_3\left( \frac{\ln {\bar{x}}}{1-x}\right) _+\nonumber \\&+\, A_4 \ln {\bar{x}}+A_5 \ln ^2 {\bar{x}}++A_6 \ln ^3 {\bar{x}} \nonumber \\&+\, B_1\ln x+B_2 \ln ^2 x+B_3 \ln ^3 x+B_4\frac{1}{x}+B_5\frac{\ln x}{x} \nonumber \\&+\,c_1+c_2 x+c_3 x^2 +c_4 x^3+c_5 \ln {\bar{x}} \ln x \nonumber \\&+ \,c_6 \ln {\bar{x}} \ln ^2 x, \end{aligned}$$
(47)

where coefficients A, B and c are functions of \({\mathbf {L}}_\mu \). Such an approximate form is widely used in NNLO+ phenomenology of PDFs, see e.g. [68]. Here, the coefficients A and B represent the singular at \(x\rightarrow 1\) and \(x\rightarrow 0\) terms, and are evaluated exactly. The coefficients c represent the smooth part of the coefficient function, which is reconstructed by appropriate values of \(c_i\) with better then \(1\%\) accuracy. The values of constants A, B and c are presented in the “Appendix B.3”. In the convolution integral the main numerical contribution comes from the singular terms proportional to A and B, which are exact. The relative difference between the convolution with exact coefficient function and approximate expression in Eq. (47) is of order \(10^{-6}\). This numerical error is compatible with the numerical error of integration procedure and far inside the theoretical error-bands.

The evaluation of the Hankel-type integral over b is one of the main source of numerical errors. Typically, in order to obtain sufficient precision one should include a large number of points into the integral, which is very costly especially at NNLL/NNLO. arTeMiDe evaluates this integral with the Ogata quadratures [69]. The Ogata quadrature is a double exponential quadrature, whose nodes are the zeros of the Bessel function. It provides a fast and precise evaluation of Hankel-type integrals with the minimal number of integrand calls.

The fitting procedure has been performed by minimizing the \(\chi ^2\)-function. The minimization of the \(\chi ^2\) distribution has been done using the MINUIT package from the CERN library [70, 71]. The estimation of the statistical uncertainties for non-perturbative parameters is made with the MINOS procedure, performing the variation of parameters in the range \(\chi ^2\pm \varDelta \chi ^2\), with \(\varDelta \chi ^2\) corresponding to the 68% confidence level (i.e. \(\varDelta \chi ^2\simeq \{1.03,2.32,3.55\}\) for 1–3 fitting parameters, correspondingly.) The sources of theoretical uncertainties have been pointed in Sect. 2.5, and parameterized by the constants \(c_{1,2,3,4}\). The variation of these constants in the region (0.5, 2) produces the error-bands. The discussion on the individual contributions of theoretical uncertainties associated with different scales is given in Sect. 3.5.

3.3 Models for non-perturbative part of TMDPDFs

The non-perturbative part of the TMDPDF in general needs some ansatz, the parameters of which are to be extracted from data. In our study we have tested different ansatzes of the following general form

$$\begin{aligned} F_{q\leftarrow h}(x,\varvec{b})= & {} \int _x^1 \frac{dz}{z} \sum _{f}C_{q\leftarrow f}\left( z,\varvec{b};\mu ,\zeta _\mu \right) f_{f\leftarrow h}\left( \frac{x}{z},\mu \right) \nonumber \\&\times f_{NP}\left( z,\varvec{b}\right) , \end{aligned}$$
(48)

where \(f_{f\leftarrow h}\) is the PDF of the parton with the flavour f. The non-perturbative information of the TMDPDF, which is unreachable from the PDFs, is contained in \(f_{NP}\). In order to match the perturbative regime, the function \(f_{NP}\) should approach 1 for \(b\rightarrow 0\). Instead, the behavior of \(f_{NP}\) for \(b\rightarrow \infty \) is not so well established, which requires a test of different models. In the current study, we restrict ourself to flavor independent \(f_{NP}\), i.e. \(f_{NP}\) is common for TMDPDFs of different flavours. The flavour-dependence of TMDPDFs enters only through PDFs and coefficient functions, i.e. it is completely determined.

The large-b behavior of TMD distributions is the key point of TMD parametrization and extraction. There is no common agreement on this behavior. Clearly, such an agreement cannot be achieved in general, since the b-shape of a TMD distribution is strongly dependent on the large-b prescription. For example, the Gaussian behavior is typically observed in the models based on \(b^*\)-prescription. Moreover, the classical fits by ResBos package [15] disfavor other non-perturbative behaviors, such an exponential one (for more recent discussion, see [24]). Also the Gaussian shape is used in DYRes code [21] (together with \(b^*\)-prescription) and in DYqT code [18] (together with the minimal prescription). Contrary, the fit made in Ref. [20], which does not employ the \(b^*\)-prescription, uses an exponential shape of \(f_{NP}\) and also obtains an agreement with data. We point out that the use of LHC data for TMD extraction is made here for the first time (to our knowledge). Given the precision of LHC data, the consistency and/or goodness of all previous hypotheses has to be rediscussed.

In order to decide the best shape of \(f_{NP}\) within \(\zeta \)-prescription, we have considered several subsets of the data. It appears important to include simultaneously both high-energy and low-energy data because they are sensitive to different parts of the b-space spectrum. We have found that the most optimal data subset is given by the E288 data and the ATLAS Z-boson production data, see Tables 2, 5. In this subset, the very small error-bands of ATLAS data are compensated by a large number of points in E288 data, and as a result, we have a certain equilibrium between low and high-energy inputs.

Using the E288/ATLAS subset we have performed multiple fits using several different functional forms of \(f_{NP}\). Probably, the most informative preliminary test is the comparison of the pure Gaussian and exponential behavior for separate/joint low and high energy data points. In Table 9 we demonstrate results of fit with some simple single-parameter models. According to this table, although the quality of the fit is still not optimal, the high-energy data clearly favor the Gaussian shape of \(f_{NP}\), while the low-energy data favor the exponential behavior of \(f_{NP}\). This difference is simply explained if we recall that at higher energies (and thus at generally higher \(q_T\)) the Fourier integral in Eq. (4) is saturated by small values of b. At lower energies (and thus at generally smaller \(q_T\)) the Fourier integral in Eq. (4) is affected by a wider interval of values of b. Therefore, the results presented in the Table 9, suggest that \(f_{NP}\) should be Gaussian at small-b and exponential at large-b. This is in complete agreement with the theory expectations discussed in Sect. (2.3). The expected \(f_{NP}\) should be a function with a Taylor series expansion (around \(b=0\)) of even powers of b, with an exponential decay at \(b\rightarrow \infty \). A simple representative of such functions is \(\cosh ^{-1}(\lambda b)\). The test of this \(f_{NP}\) is given in the last columns of the Table 9 which clearly shows that this function alone, although it works much better than a Gaussian or an exponential, is not able to describe both low and high energy data, and thus we need extra parameters.

Table 9 The values of \(\chi ^2/d.o.f\) for different single-parameter non-perturbative functions \(f_{NP}\), minimized on different data sets. The \(\chi ^2/d.o.f\) values correspond to \(\delta _T=0.2\) and NNLL/NNLO

The preliminary tests with simple one-parameter dependence for the \(f_{NP}\) shape can be summerized by the following:

  1. (i)

    The high and low energy data should be considered altogether, because they test different intervals of the b-space spectrum of \(f_{NP}\).

  2. (ii)

    The subset of data points E288 and ATLAS Z-boson, is very selective for the \(f_{NP}\). A good fit of this subset guaranties the good fit for the whole set of data points. Nevertheless, in the following sections, we include all experiments, for consistency.

  3. (iii)

    Both theoretically and phenomenologically, we argue that \(f_{NP}\) should be a function of even powers of b with an exponential asymptotic behavior at \(b\rightarrow \infty \). Using a minimal set of two parameters (and the evolution parameter \(g_K\)) we find that one can easily fit the data with a \(\chi ^2/d.o.f\sim 1.2\)–1.3. The addition of more parameters (say for the control of \(b^4\) correction and/or flavor dependence) has the possibility to increase the quality of the fit. However, in this work, we do not consider extra parameters, since the current quality of the fit is already typical and reasonable for the modern TMD extraction (compare e.g. with [22]).

  4. (iv)

    One needs at least two parameters (one to control \(\sim b^2\) behavior at \(b\rightarrow 0\) and another to control the asymptotics) to fit simultaneously low and high-energy data. However, the multiplication by polynomials (e.g. \(f_{NP}\sim (1+\lambda b^2)/\cosh (b)\)) does not work well, which suggests that the asymptotic terms \(\sim b^2 e^{-b}\) are disfavored.

Based on this experience we have formulated some simple ansatzes for \(f_{NP}\).

  • Model 1  This ansatz uses the fact that the simplest even-b function with exponent asymptotics is the hyperbolic cosine. The model reads

    $$\begin{aligned} f_{NP}(b)=\frac{\cosh \left( \left( \frac{\lambda _2}{\lambda _1}-\frac{\lambda _1}{2}\right) b\right) }{\cosh \left( \left( \frac{\lambda _2}{\lambda _1}+\frac{\lambda _1}{2}\right) b\right) }, \end{aligned}$$
    (49)

    where \(\lambda _1[\text {GeV}]>0\) and \(\lambda _2[\text {GeV}^2]>0\) are free parameters. The advantage of this model is its simplicity and independence of the Bjorken variable. The model 1 has a quadratic (Gaussian) behavior at small-b \(f_{NP}\sim e^{-\lambda _2 b^2}\) and exponential behavior at large-b \(f_{NP}\sim e^{-\lambda _1 b}\).

  • Model 2  The model 2 reads

    $$\begin{aligned} f_{NP}(z,b)=\exp \left( \frac{-\lambda _2 z b^2}{\sqrt{1+z^2 b^2\frac{\lambda _2^2}{\lambda _1^2}}}\right) , \end{aligned}$$
    (50)

    where \(\lambda _1[\text {GeV}]>0\) and \(\lambda _2[\text {GeV}^2]>0\) are free parameters. In this model we attempt to incorporate the theoretical expectations on the z-dependence of \(f_{NP}\). So, the model 2 has a \(zb^2\)-behavior at small-b \(f_{NP}\sim e^{-\lambda _2 zb^2}\) and exponential behavior at large-b \(f_{NP}\sim e^{-\lambda _1 b}\).

Both models have two parameters, which we include in the parameterization such that the parameter \(\lambda _1[\text {GeV}]\) dictates the asymptotical behavior at large b. and the parameter \(\lambda _2[\text {GeV}^2]\) gives the quadratic term. A priory, the parameter \(\lambda _1\) should be of order of \(m_\pi \sim 0.14\;\text {GeV}\), since it is the only natural scale of strong forces at large distances. The parameter \(\lambda _2[\text {GeV}^2]\) roughly corresponds to the size of the leading power correction to small-b OPE, see Sect. 2.3. We can associate \(\lambda _2\) with the scale B as \(\lambda _2\sim B^{-2}\). In Ref. [12] we have estimated the size of this parameter in the large-\(\beta _0\) approximation as \(\lambda _2\sim 0.075\;\text {GeV}^2\).

Additionally, to the parameters \(\lambda _{1,2}\) we have studied the parameter \(g_K[\text {GeV}^2]>0\), which parametrizes the non-perturbative contribution to the rapidity evolution kernel \({\mathcal {D}}\) (see Eq. (21)). The importance of this parameter is not clear from the literature. In Ref. [12] we have estimated its size in the large-\(\beta _0\) approximation as \(0.01\pm 0.03\;\text {GeV}^2\), i.e. consistent with zero. Also, the fit of [20] shows a negligible influence of this parameter on the final results. Therefore, in the following we consider both possibilities \(g_K=0\) and \(g_K\ne 0\). In Sect. 3.7, we demonstrate that the parameter \(g_K\) is important at lower perturbative order, but its influence is negligible at NNLL/NNLO.

3.4 The domain of TMD factorization

Table 10 The number of points with \(q_T<\delta _T Q\) for each data set. In the majority of fits we use \(\delta _T=0.2\), see explanation in the text
Fig. 3
figure 3

The \(\delta _T\) dependence of the value of \(\chi ^2/d.o.f.\) for some one-parameter models. The value of the parameter coming from the fit is also shown together with systematic uncertainties

The TMD factorization is restricted to the small-\(q_T\) range. The size of the allowed \(q_T\)-region is a priory unknown. We have not found any phenomenological studies on this point but only some statement on the strong dependence of the fit on the \(q_T\)-window. A specific study on TeVatron Z-boson production data in Ref. [53] shows that the Y-term contribution is extremely marginal for \(q_T<30\) GeV.

In order to make a qualitative study, we introduce the parameter \(\delta _T\) and we consider all data points with \(q_T<\delta _T Q\). The amount of data points which are allowed by such a restriction are shown in the Table 10. In order to estimate the maximum value of \(\delta _T\) we perform a series of fits with increasing values of \(\delta _T\). Ideally, the \(\chi ^2/d.o.f.\) and the fitting parameters should be stable within and unstable outside of the allowed \(\delta _T\) interval. In this way, considering the dependence on \(\delta _T\) one should find an interval of \(\delta _T\) for which the fit is not sensitive to the Y-term. This point indicates the region of TMD-factorization, and should not depend of the perturbative order.

We have performed such a test for high-energy data set with different one-parameter forms of \(f_{NP}\). We have especially used the one parameter models to guarantee the absence of fine-tuning of the cross-section. For this reason we also exclude the E288 data, because it is impossible to describe high- and low-energy data with a single non-perturbative parameter. The result of the fits practically agrees for all tested models and orders. In Fig. 3, we present some typical outcome of the fits.

In plots 3 one can see that all models reproduce the data very-well at very small \(\delta _T\), which is expected since the TMD factorization is only valid at \(q_T\ll Q\). Then the value of \(\chi ^2\) slightly grows but keeps less then one until \(\delta _T=0.2\) and after this threshold it jumps to higher values. The next jump is seen at \(\delta _T=0.25\). After \(\delta _T=0.25\) the value of \(\chi ^2\) increases rapidly. We interpret this fact saying that at \(\delta _T=0.2\) we become sensitive to Y-term, and at \(\delta _T=0.25\) the Y-term starts to dominate the cross-section, i.e. we leave the domain of TMD factorization. We have found that the presented plots rather strongly depend on the set of pertubative scales. For some choice of these scales, one can obtain an ideally flat plateau of \(\chi ^2\) for \(\delta _T\leqslant 0.2\). However, the values of the two important thresholds, namely, \(\delta _T=0.2\) (where deviation form TMD factorization appears) and \(\delta _T=0.25\) (where the TMD factorization is completely broken), are stable with perturbative scales.

As a result of these tests, in the following we use the data points with \(q_T\lesssim 0.2\; Q\), or say \(\delta _T=0.2\). The choice of \(\delta _T\) that we make is consistent with [53]. This range includes 163 high-energy and 146 low-energy data points (in total 309 data points). Comparing this number of points with the literature, we observe that, it is the largest set of points for Drell-Yan/Z-boson production used up to present in a simultaneous fit of TMDPDF (to our knowledge), which also has the largest considered range of energies from \((Q,\sqrt{s})=(4,19.4)\;\text {GeV}\) (from the E288 experiment) to \((Q,\sqrt{s})=(150,8000)\;\text {GeV}\) (from the ATLAS experiment).

3.5 Scale variations and theoretical uncertainties

Fig. 4
figure 4

Theoretical error-bands and experimental data points for CDF-Run 2 experiment. The theoretical error is estimated changing \(c_{1,2,3,4}\) in the range (0.5, 2) at each perturbative order. The nonpertubative input is provided by model 2. The sub-panels show the relative size of error-band for theory and experiment                                      

Fig. 5
figure 5

Theoretical error-bands and experimental data points for LHCb (13 TeV) experiment at 13–14 GeV. The theoretical error is estimated changing \(c_{1,2,3,4}\) in the range (0.5, 2) at each perturbative order. The nonpertubative input is provided by model 2. The sub-panels show the relative size of error-band for theory and experiment

The theoretical uncertainties of the perturbative inputs are tested by varying the perturbative scales around their central values, as it is discussed in Sect. 2.4. The distribution of uncertainties through orders for a typical high energy experiment is shown in Figs. 4, 5, and for a typical low-energy experiments in Figs. 6, 7. The complete set of plots for every included experiment can be found in [31].

The uncertainty associated with the TMD evolution factor is parameterized by the \(c_1\)-variation. This uncertainty drops down between NLL/NLO and NNLL/NLO orders, that is together with the increase of the perturbative order for \({\mathcal {D}}\) (see Table 1). The size of the band is correlated with the energy of the process, that is, it is less significant for higher-energy experiments.

The uncertainty associated with the hard scale depends on the \(c_2\)-variation. This band is independent of \(q_T\). This error is the main one at NLL/LO (which we do not present here), but becomes negligible at higher orders.

The uncertainty associated with the low-energy behavior of the evolution factor is parameterized by the \(c_3\)-variation. We have found that it significantly influences the shape of the cross-section and also it is rather large at small-\(q_T\). As expected it is decreases going from NLL/NLO to NNLL/NNLO. At NNLL/NNLO it gives the main contribution to the uncertainty band for the cross-section.

The uncertainty associated with the small-b matching of coefficients and PDFs is represented by the \(c_4\)-variation. It is the most interesting error because it checks the convergences of the \(\zeta \)-prescription. The corresponding error-band is larger at \(q_T\rightarrow 0\), which corresponds to the contribution of large \({\mathbf {L}}_\mu \) (we remind that in \(\zeta \)-prescription, \({\mathbf {L}}_\mu \) grows unrestrictedly). The important observation is that the large uncertainty area significantly shrinks between NLL/NLO and NNLL/NNLO, although the NNLL/NNLO contains a higher power of \({\mathbf {L}}_\mu \). This shows a very good behavior of the \(\zeta \)-prescription. In total this error is dominant at NLL/NLO, but becomes smaller (although compatible) to the one coming from the \(c_3\) variation at NNLL/NNLO.

The size of the theoretical error-band is significantly bigger at small-Q, as can be visually checked comparing Figs. 45 to Figs. 6, 7. The uncertainties reduces when one increases the perturbative orders, both in high and low energy cases. However for the low energy case the error remains of order \(\sim 20\%\) or higher even at NNLL/NNLO, which can be problematic for a precise description of these experiments. We additionally stress that at NLL/LO the uncertainties range from 50 to \(100\%\) and higher. This shows that this particular order has no prediction power, and should not be considered any serious for a well based extraction of TMDs. This is the main reason for excluding NLL/LO order from our analysis.

Fig. 6
figure 6

Theoretical error-bands and experimental data points for E288 (200 GeV) experiment at 5–6 GeV. The theoretical error is estimated changing \(c_{1,2,3,4}\) in the range (0.5, 2) at each perturbative order. The nonpertubative input is provided by model 2. The sub-panels show the ratio of deviation to the central line (with \(c_i=1\))

Fig. 7
figure 7

Theoretical error-bands and experimental data points for E288 (400 GeV) experiment at 13–14 GeV. The theoretical error is estimated changing \(c_{1,2,3,4}\) in the range (0.5, 2) at each perturbative order. The nonpertubative input is provided by model 2. The sub-panels show the ratio of deviation to the central line (with \(c_i=1\))

In order to provide a final definition of the theoretical error, we use all scale variations and we take the maximum deviation among them. We have found that our definition of uncertainties is close, as far as one can compare different theoretical expressions, to the common definition used e.g. in [21, 43]. In total, for the high-energy experiments we find that the theoretical uncertainty (at NNLO) is of the order \(2{\text {--}}3\)% at the peak. It grows to \(\sim 5{\text {--}}6\%\) at maximum allowed \(q_T\), and to \(\sim 10\%\) at \(q_T\rightarrow 0\). This value seems to be smaller (but comparable) to the typical values of uncertainties presented ResBos or DYRes. This is a definite positive point of the \(\zeta \)-prescription. Indeed, the main contribution (at high energies) to it comes from the \(c_3\)- and \(c_4\)-error-bands, which are controlled by \(\zeta \)-prescription. The \(c_4\)-band would be significantly larger in the presence of double-logarithms, which are absent due to the \(\zeta \)-prescription.

3.6 Normalization

As the TMD factorization approach describes the shape of the differential cross section only in a limited range of \(q_T\), we need some extra input to normalize the curves. In order to compare with the data, we weight the differential cross-section by the total (or fiducial) cross-section. The values of the theory predictions for total cross-sections can be obtained from the studies of other groups. For example, one can use the DYNNLO code [17, 54]. Its predictions for the total cross-sections are presented in the Tables 3, 4. However, we found that such a strategy is unreliable, because even tiny disagreement in the normalization leads to huge effects in the \(\chi ^2\)-minimization. This is especially important for LHC data sets, which have very small error-bands. Additionally, as we demonstrate later, the DYNNLO predictions are worse than that obtained using our normalization factors.

Therefore, to fit the high energy data set we introduce a normalization factor for each data set. This factor equals the partial integral over \(q_T\) for experimental and theoretical cross-sections, and reads

$$\begin{aligned} {\mathcal {N}}=\frac{\sum _{\begin{array}{c} \text {included}\\ {\text {bins}} \end{array}} \varDelta q_T \frac{d\sigma _{\text {exp.}}}{dq_T}}{\sum _{\begin{array}{c} \text {included}\\ {\text {bins}} \end{array}} \varDelta q_T \frac{d\sigma _{\text {th.}}}{dq_T}}, \end{aligned}$$
(51)

where \(\varDelta q_T\) is the size of the \(q_T\) bin. In this way, we fit only the \(q_T\)-shape of cross-section, which is already very restricting, as we discussed in the previous section.

The values of \({\mathcal {N}}^{-1}\) resulting from the calculations are presented in Table 11. It is clear that the deviation between the theory and experiment decreases with perturbative orders. For the majority of experiments (excluding the Z-boson production measured by ATLAS), we find a good agreement for the absolute value of the differential cross-section obtained from the data points and the TMD factorization. It is important that the values of \({\mathcal {N}}\) are very stable with respect to the change of non-perturbative model and to the scale variation. In particular, we do not present the error-band on the normalization values in the Table 11, because they are smaller then the present precision.

Table 11 The normalization factors for the cross-section for each experiment. The dimensional-less numbers are ratios of partially integrated cross section over \(q_T\) (51) (theory/experiment, i.e. \({\mathcal {N}}^{-1}\)), for the data with the published value of total cross-section. For the data sets with unpublished values of total cross-section, the value of the total cross-section used for normalization is presented. The numbers are given for the model 1. The variation of the scales and models gives the change of numbers in the unrepresented digits. The numbers shown in bold are those which agree with the measured cross-section within the error bars
Table 12 The results of the \(\chi ^2\)-minimization procedure with \(g_K=0\). The values of \(\chi ^2\) are given including the theoretical error-band. The values of extracted parameters are given with statistical error-band (the first pair of numbers) and the theoretical error-band (the second pair of numbers). The visual presentation of this table is given in Fig.9
Table 13 The results of the \(\chi ^2\)-minimization procedure with non-zero \(g_K\). The values of \(\chi ^2\) are given with theoretical error-band. The values of extracted parameters are given with statistical error-band (the first pair of numbers) and the theoretical error-band (the second pair of numbers). The visual presentation of this table is given in Fig. 9

The normalization of the data from E288 experiment is generally unknown. Most probably, the main source of discrepancy comes from the fiducial cuts made for E288 experiment, which cannot be restored nowadays. The small fiducial cuts do not seriously influence the \(q_T\)-shape of the differential cross-section, but can sizably decrease the total normalization. In our analysis, we change the common normalization of all E288 data points as

$$\begin{aligned} {\mathcal {N}}_{E288}=0.8. \end{aligned}$$
(52)

This or close values have been used in different fits, see e.g. [15, 20]. However, we do not seriously ground on it, e.g. we can switch to 0.85 or 0.9 without significant loss in \(\chi ^2\) (however, the value 1 produces serious disagreement with our current input). One should take into account that the theoretical uncertainty at small\(-Q\) is very large, see Figs. 6, 7. It also implies that low-energy cross-sections are very sensitive to the choice of PDF set (in particular, our approximation of Eq. (43) for nuclei PDF could be too crude). We have checked that the E288 data can be also fitted with \(N_{E288}=1\) to the same values (or better) of \(\chi ^2\) by additional variation of \(Q_0\) (similar to the fit made in [24]). Such an ambiguity represents a problem in the analysis of the low-energy data.

3.7 Results of the fits and TMD extraction

In this section, we present the results of the global fit for the complete data sets presented in Sect. 3.1, which allows the extraction of the unpolarized TMDPDF. We have made two independent fits, with \(g_K=0\) and with \(g_K\ne 0\). The results of the \(\chi ^2\) minimization and the values of the extracted parameters are presented in Tables 12, 13. The visual presentation is given in Figs. 89.

Fig. 8
figure 8

The values of parameters for \(f_{NP}\) extracted from the global fit with \(g_K=0\). Red marks represent the extraction with model 1. Blue marks represent the extraction with model 2. The black marks show the values of parameters extracted at \(c_{1,2,3,4}=1\). The thick bands represent the statistical errors of parameter determination. The thin error-bands represent the theoretical error on extracted parameters due to variation of \(c_{1,2,3,4}\in [0.5,2]\). The numerical values of parameters are given in the Table 12

Fig. 9
figure 9

The values of parameters for \(f_{NP}\) extracted from the global fit with \(g_K\ne 0\). Red marks represent the extraction with model 1. Blue marks represent the extraction with model 2. The black marks show the values of parameters extracted at \(c_{1,2,3,4}=1\). The thick bands represent the statistical errors of parameter determination. The thin error-bands represent the theoretical error on extracted parameters due to variation of \(c_{1,2,3,4}\in [0.5,2]\). The numerical values of parameters are given in the Table 13

We have estimated both statistical and theoretical errors on the fit parameters. The statistical errors are related to the uncertainty of the \(\chi ^2\)-minimization and are induced by the experimental error-bands. The statistical errors have been estimated by the MINOS procedure of MINUIT package [71]. The theoretical errors are related to the uncertainty of perturbation series. There is no common procedure for the estimation of the theoretical error. Therefore, we propose the method presented in the following.

The theoretical error is estimated by a set of independent fitting procedures for each variation of the scale constants \(c_{1,2,3,4}\in [0.5,2]\), as discussed in Sect. 2.5. In other words, we set, say, \(c_1=2\) and perform the minimization of \(\chi ^2\). In this way, we obtain a new set of model parameters (and a new value of \(\chi ^2\)). In total, we have 8 independent variations and hence have 8 values of parameters. The final theoretical error-band is given by the maximal positive and minimal deviations from the central value and the results are reported in Table 14. A drawback of this procedure is the variation of a scale can lead to the serious increase in \(\chi ^2\). In other words changing the matching scales affects also the quality of the fit. In general, the size of the band for \(\chi ^2\) value represents the stability of the theoretical model, and they are also reported in Table 14. One can see that the error for \(\chi ^2\) significantly drops with orders.

In Refs. [21, 43] a different procedure has been used for the estimation of theoretical errors which takes into account a combined variation of all constants \(c_i\) in a Monte Carlo analysis. Because of the fact that the error is usually dominated by just one of these constants we expect that the method used here offers a comparable estimate. More work about this issue can be done in the future.

Table 14 Example of parameter extraction with the variation of \(c_{1,2,3,4}\) constants, and evaluation of the theoretical error. Bold numbers in brackets represent the deviation of the parameter from its central value

The values of the parameter \(\lambda _1\), which parametrizes the asymptotics of TMDPDFs, extracted at different orders agree with each other within the error-band, that slightly reduces with the increase of the order. It has a natural size of the order of pion mass, \(\lambda _1\sim 1.3 m_\pi - 2.3 m_\pi \). The values of the parameter \(\lambda _2\), which parameterizes the quadratic correction to the small-b regime, are not so stable although the values at different orders are compatible within the errors. In particular, they have large error-bars at NNLL/NLO order. The behavior of \(g_K\) is the most peculiar. It decreases with the increase of the perturbation order. Moreover, at NNLL (both /NLO and /NNLO) its error-band touches the zero. It can be interpreted as following: the parameter \(g_K\) is very small (or even zero) but within the fit, it tends to compensate the missing higher perturbative orders of evolution exponent. We also observe that all extractions of \(g_K\) agrees with the theoretical estimation \(g_K=0.01\pm 0.03\;\text {GeV}^2\) made in [12]. One can see that both models produce very similar results both for \(\chi ^2\) and the parameters.

Table 15 The values of \(\chi ^2/points\) for individual data sets. The boxes indicate the values of partial \(\chi ^2\) which are responsible for the increment of \(\chi ^2/d.o.f.\) from NLL/NLO to NNLL/NNLO

As expected the theoretical error is reduced with the increase of the perturbative order. In particular, the band on the value of \(\chi ^2\) is significantly smaller at NNLL/NNLO. The distribution of parameter values over perturbative orders presented in Table 14 is typical. The variation of \(c_1\) does not represent the main contribution to the error-band. It implies that the low-energy matching for the rapidity anomalous dimension is not so important (in comparison to other matchings), as typically expected.

The variation of \(c_2\) is almost negligible. Here, however, we recall that \(c_2\) influences only the common normalization factor, and thus the effect of its variation could be underestimated due to our fitting procedure. The variation of \(c_3\) and \(c_4\) produces the most part of the error-band and the strongest variation of \(\chi ^2\). At \(g_K=0\) these variation are more-or-less equivalent. At \(g_K\ne 0\) there is a clear pattern. In this case, the variation of \(c_3\) gives the main error-band on \(g_K\), while the variation of \(c_4\) gives the main error-band on parameters \(\lambda _{1,2}\). It is very natural since the variation of \(c_3\) tests the low-energy normalization point of the evolution factor, and \(c_4\) tests the uncertainties of perturbation determination of the TMDPDF.

In Table 15 we present the distribution of values for \(\chi ^2\) among experiments. One can see that the most stringent constraints come from the Z-boson production data of ATLAS and D0 run2. This is due to the small experimental uncertainty of these data points. At the low-energy, the main tension is presented by the 4–5 GeV bins, while the rest are distributed more-or-less homogeneously. It probably indicates the influence of generic factorization violating terms. The plots of the theoretical curves (at NNLL/NNLO for model 1) and the data points for individual experiments are shown in Figs. 10, 11, 12, 13 and 14. The plots for different models and at other orders can be found in [31].

Fig. 10
figure 10

The comparison of the data for Z-boson production collected at Tevatron experiments (run 1 and run 2) to the fit of model 2 at NNLL/NNLO. Red data points are those which included in the fit (i.e. with \(\delta _T=0.2\)). Gray data points are those which are not include in the fit (i.e. \(\delta _T>0.2\)). The blue band is the theoretical uncertainty obtained from the variation of scales

Fig. 11
figure 11

The comparison of the data for Z-boson production collected at ATLAS and CMS experiments to the fit of model 2 at NNLL/NNLO. Red data points are those which included in the fit (i.e. with \(\delta _T=0.2\)). Gray data points are those which are not include in the fit (i.e. \(\delta _T>0.2\)). The blue band is the theoretical uncertainty obtained from the variation of scales

Fig. 12
figure 12

The comparison of the data for Z-boson production collected at LHCb experiment to the fit of model 2 at NNLL/NNLO. Red data points are those which included in the fit (i.e. with \(\delta _T=0.2\)). Gray data points are those which are not include in the fit (i.e. \(\delta _T>0.2\)). The blue band is the theoretical uncertainty obtained from the variation of scales

Fig. 13
figure 13

The comparison of the data for Drell–Yan reaction collected at ATLAS to the fit of model 2 at NNLL/NNLO. Red data points are those which included in the fit (i.e. with \(\delta _T<0.2\)). Gray data points are those which are not include in the fit (i.e. \(\delta _T>0.25\)). The blue band is the theoretical uncertainty obtained from the variation of scales

4 Conclusion

The unpolarized Drell-Yan process at small-\(q_T\) offers the simplest application of the TMD factorization formalism, and as such it has been studied by many groups. In this work, we have revised the main points of the practical implementation of TMD factorization, and reveal some new aspects of the TMD phenomenology. Altogether it allows us to critically reanalyze the available Drell-Yan data and to extract consistently the unpolarized TMDPDFs, within some approximation. The primary aim of our analysis is to answer some general questions for the TMD approach such as: Up to which \(q_T\) the TMD factorization works? What is the best asymptotical behavior of a TMD distributions? How convergent is TMD formalism at higher orders of perturbative expansion? The answers to these questions are naturally affected by the used prescriptions for the practical implementation of the TMD formalism. Even so, these important issues of TMD phenomenology are undiscussed in the literature or discussed very superficially. Implementing consistently the TMD factorization formalism, we are able to fit a large set of Drell-Yan data points which ranges from low (\(Q=4\;\text {GeV}\)) to high (\(Q=116\)\(150\;\text {GeV}\)) dilepton invariant masses on a wide interval of center of mass energies and using a limited set of parameters (two for the non-perturbative part of TMDPDF and one for the non-perturbative part of the TMD evolution).

In this work, we have formulated and used the \(\zeta \)-prescription, which is one of the main new theoretical contributions of this article. The \(\zeta \)-prescription consists of a particular choice of the rapidity evolution scale \(\zeta =\zeta _\mu \), which depends on \(\mu \), b and the parton flavor (quark or gluon). This choice corresponds to the equi-evolution line in the space of TMD scales, and thus a TMD distribution is \(\mu \)-independent along this line. As a consequence, all logarithms related to the TMD evolution, which are essentially double logarithms, are eliminated from the small-b OPE. It significantly improves the perturbative convergence and the radius of convergences for the small-b OPE. The value of \(\zeta _\mu \) is dictated by the differential equation (29), which can be solved order-by-order in perturbation theory. We stress that the \(\zeta \)-prescription does not strictly solve the problem of large-b logarithms, which are still present in the matching coefficients. However, these logarithms are not related anymore to the TMD scales. Moreover, these logarithms are accompanied by the x-dependent coefficients which preserve the integral over x in accordance with the probability interpretation of PDFs. Note, that the \(\zeta \)-prescription is universal for all TMDs of the leading dynamical twist, due to the universality of TMD ultraviolet and rapidity renormalization factors. There are multiple possibilities to apply \(\zeta \)-prescription, see some discussion in “Appendix B.1”. In this work, we have used the simplest one, which can be certainly improved. A further study of the \(\zeta \)-prescription will be done elsewhere.

Within our implementation of TMD factorization and TMD distributions, which has a generic form, we have three independent perturbative series: one for hard matching, one for rapidity evolution, and one for small-b matching. To defend the approach we provide the estimation of the perturbative uncertainty by variation of associated scales by factor 2, see Sect. 3.5. We have considered several successive perturbative orders (see Table 1), and demonstrate that the theory uncertainties and the agreement with the data improve with the increase of the perturbative order. The agreement of the theory with the experiment resulting in our fit is a consequence of the \(\zeta \)-prescription to a large extent. The lowest possible combination of perturbative order, namely NLL/LO, produces very large theoretical error-bands and thus has been excluded from the present study.

Our analysis shows that data are very selective about the non-perturbative part of the TMDs and only well-behaved models can accommodate the fit. The best models for the non-perturbative part of TMDPDF that we have found are formulated in Sect. 3.3. They have a common non-perturbative structure, namely

$$\begin{aligned} F(x,\varvec{b})\simeq \int _x^1 \frac{dz}{z}C(z,{\mathbf {L}}_\mu )f_{NP}(z,\varvec{b})f(x/z,\mu ), \end{aligned}$$
(53)

where f is the PDF, C is the small-b matching coefficient and \(f_{NP}\) is a non-perturbative input. We have found that the best agreement with data is given when the function \(f_{NP}\) behaves as

$$\begin{aligned}&\text {at small }b:\quad f_{NP}\simeq e^{-\lambda _2 \varvec{b}^2}, \nonumber \\&\text {at large }b:\quad f_{NP}\simeq e^{-\lambda _1 |\varvec{b}|}. \end{aligned}$$
(54)

We have considered two ansatzes which respect Eq. (54), see Sect. 3.3 and Eqs. (49) and (50). The models have different behavior in the intermediate \(\varvec{b}\) region, in particular, model 2 has z dependence. Nonetheless, the models produce nearly identical values of \(\chi ^2\) and of parameters \(\lambda _{1,2}\). It implies that the parameters \(\lambda _{1,2}\) that largely determine the shape of TMDPDF have a precise physical meaning. The values of parameters are reasonable \(\lambda _1\sim 1.5\; m_\pi \) and \(\lambda _2\sim 0.5\) \(\hbox {GeV}^2\). We also study the influence of the parameter \(g_K\), that parameterizes the non-perturbative part of TMD evolution. We have found that this parameter is significant at lowest order (in our case NLL/NLO) and less significant at higher orders. Moreover, at NNLL/NNLO the value of \(g_K\) is compatible with zero within the error-bars. We supplement our extraction with the estimation of the theoretical and statistical errorbars.

Fig. 14
figure 14

The comparison of the data for Drell–Yan reaction collected at E288 experiment to the fit of model 2 at NNLL/NNLO. Red data points are those which included in the fit (i.e. with \(\delta _T=0.2\)). Gray data points are those which are not include in the fit (i.e. \(\delta _T>0.2\)). The blue band is the theoretical uncertainty obtained from the variation of scales

The theoretical uncertainty on the extracted parameters is shown in Figs. 8, 9. Several improvements of the current approach can certainly help the reduction of errors and allow us to understand them better. In fact one can cite the inclusion of factorization breaking corrections (Y-terms), and also QED/isospin breaking corrections for LHC data. These issues will be a matter of discussion of future works.

Another aspect that we point out, is the practical limitation of TMD factorization. To make the discussion quantitative we introduce the parameter \(\delta _T\), which is the highest allowed ratio \(q_T/Q\) accounted in the fit. Clearly, at very low \(\delta _T\) the TMD formalism should perfectly work, e.g. provide small values of \(\chi ^2\)-distribution. Our expectation is that within the domain of the TMD-factorization the value of \(\chi ^2/d.o.f.\) is largely constant and starts to grow outside of this domain. Indeed, for the best models, the observed picture agrees with the expectation. In this way, we have shown that TMD factorization as it is, i.e. in the absence of Y-term, is capable of describing the data with \(q_T\lesssim 0.2\ Q\), i.e. \(\delta _T=0.2\). With some risk, one can prolong it to \(\delta _T=0.25\). After \(\delta _T=0.25\) the TMD factorization loses any agreement with the experiment. This analysis is unique, or at least we do not know any analog in the literature.

The fit and the plots of the data has been done with the help of arTeMiDe, version 1.1, available at [31]. This is a code package for the numerical evaluation of TMD distributions and related cross-sections. It has a flexible structure and allows to consider an arbitrary combination of perturbative orders up to NNLO for coefficient functions and \(\hbox {N}^3\hbox {LO}\) for evolution factors. In the current version, it evaluates only unpolarized TMDPDFs, but we expect to update it for polarized cases and TMD fragmentation functions, as well as, to include the Y-term, in the future.