Dipole subtraction at next-to-leading order in nonrelativistic-QCD factorization

We describe an implementation of a subtraction scheme in the nonrelativistic-QCD treatment of heavy-quarkonium production at next-to-leading-order in the strong-coupling constant, covering $S$- and $P$-wave bound states. It is based on the dipole subtraction in the massless version by Catani and Seymour and its extension to massive quarks by Phaf and Weinzierl. Important additions include the treatment of heavy-quark bound states, in particular due to the more complicated infrared-divergence structure in the case of $P$-wave states.


Introduction
In next-to-leading-order (NLO) perturbative calculations in quantum field theory, the phase space integrations of real corrections generally produce infrared (IR) divergences, which have to be regularized. The standard choice for this is dimensional regularization, where the integrations are done in D = 4 − 2ǫ space-time dimensions, so that the IR divergences show up as poles in ǫ, ready to be canceled by other contributions. The problem is that the squared matrix elements are, apart from the simplest examples, so complicated that they have to be integrated numerically, in four dimensions. To combine both ingredients, the analytic singularity cancellation in D dimensions and the numerical phase space integration in four dimensions, two basic types of calculational schemes have been devised: slicing schemes and subtraction schemes.
In phase space slicing schemes, the real-correction phase space is split into two parts, with the separation lines enclosing the IR-singular regions at close distances. Since, in the vicinity of the IR divergences, both the squared matrix elements and the phase space factorize into simple expressions, the analytic integration in D dimensions is feasible, while the part outside the enclosed region is free from singularities, ready for numerical integration. Both contributions depend on the specific choice of phase space cut, but the sum of both contributions is independent of it. Most calculations of inclusive heavyquarkonium production and decay within the factorization formalism [1] of nonrelativistic QCD (NRQCD) [2] have been implemented with a two-cutoff phase space slicing scheme as outlined in Ref. [3]. In particular, this includes our previous calculations [4]. There are, however, two principal disadvantages of the phase space slicing scheme: First, one cannot avoid a residual numerical dependence of the result on the slicing parameters and, second, the numerical integration over the finite real-correction phase space part has to be done to very high precision because there is a strong cancellation between the two phase space parts.
On the other hand, in subtraction schemes, certain simple subtraction terms with the same divergences as the real corrections are subtracted from the latter, enabling a numerical integration. The subtraction terms are then separately integrated analytically in D dimensions, and the results are added back. To our knowledge, the only NLO calculations of inclusive quarkonium production so far performed in this way are those of Ref. [5] in the color singlet model, based on Catani-Seymour dipole subtraction for massless quarks [6]. Since only color singlet S-wave states were involved, the subtraction terms of Ref. [6] were sufficient.
In this paper, we describe an implementation of a subtraction scheme for inclusive quarkonium hadroproduction within NRQCD, which can handle all intermediate S-and P -wave color singlet and color octet states. In addition to the massless Catani-Seymour scheme [6], our implementation is built upon its extension to massive particles by Phaf and Weinzierl [7]. However, we have to take special care of the structures of the amplitudes when projected onto heavy-quark bound states. In particular, new kinds of subtraction terms have to be introduced in the case of P -wave state production.
The outline of this paper is as follows: In section 2, we describe the structure of the appearing amplitudes projected onto the different Fock states and their soft and collinear limits. The divergence cancellation is explained in section 3. The subtraction scheme used is in detail presented in section 4. Details about the implementation of phase space cuts as well as numerical tests of our extended dipole subtraction approach follow in section 5. Section 6 contains a brief summary. In Appendix A, we collect the expressions through order O(ǫ 0 ) for the integrated Catani-Seymour and Phaf-Weinzierl dipoles needed in our study, in a form that already includes mass factorization counterterms.
2 Cross sections and their limits

Cross sections in NRQCD factorization
In the framework of QCD and NRQCD factorization, the cross section for the inclusive hadroproduction of heavy quarkonium H is given by with the partonic cross sections dσ(ab → QQ[n] + X) = 1 N col (n)N pol (n) Here, a and b are the colliding QCD partons with four-momenta p 1 and p 2 . f a/A (x a ) is the parton distribution function (PDF) to find parton a with a longitudinal-momentum fraction x a inside the colliding hadron A. X collectively denotes the partons that are produced besides the quarkonium H, and F sym (X) are its quantum mechanical symmetry factors for identical particles in the final state. Q is bottom for bottomonium production and charm for a charmonium production. n is the QQ Fock state, for our purposes 3 S 1 , 1 S 0 , 1 P 1 , or 3 P J in a color singlet or color octet state. The color state is marked by upper indices 1 or 8 in square brackets, like for example in the color octet 3 P [8] 1 state. N col (n) = 1 if n is a color singlet state and C 2 A − 1 = 8 if it is a color octet state, and N pol is the Ddimensional number of polarization degrees of freedom of state n. We recall that C F = 4/3 and C A = 3 are color factors of the QCD gauge group SU(3). In making the N col and N pol factors explicit, we follow Ref. [9]. O H [n] is the corresponding nonperturbative NRQCD long-distance matrix element (LDME). n col (a) and n pol (a) are the number of colors and the D-dimensional number of polarizations of parton a. dPS is the Lorentz-invariant phase space element. As a convention used throughout this paper, the bra vector is a matrix element, and in squaring the matrix element a summation of the degrees of freedom of all external particles is always understood implicitly. This convention is adopted from Catani, Seymour [6], Phaf,and Weinzierl [7], who do, however, include the n col factors in the amplitude vectors, albeit not the n pol factors. In our choice of normalization, all averaging factors are explicit. Another thing to note is that the summation of external degrees of freedom includes the spin and orbital-angular-momentum quantum numbers m s and m l of the QQ[n] state, even if the polarization vectors stand outside the amplitude vectors. Hereby, in the case of a n = 3 P [1/8] J state, this summation is always restricted to the subspace with definite J.
In our study, we are interested in observables where quarkonium H has nonvanishing transverse momentum p T . Therefore, the partonic Born cross sections and their virtual corrections already correspond to 2 → 2 processes kinematically, namely while, for the real corrections, we are led to consider the 2 → 3 kinematics processes where g is a gluon and q a light quark or antiquark (specifically u, d, s, u, d, s for charmonium and, additionally, c and c for bottomonium), q its antiparticle, and q ′ another light quark or antiquark different from q and q. As already stated above, the four-momenta of the incoming partons are p 1 and p 2 . The four-momenta of the outgoing QCD partons are p 3 and, for the real corrections, also p 4 . The four-momenta of the heavy quark and antiquark that form the QQ[n] state are parameterized by p 0 2 + q and p 0 2 − q, so that p 0 is the four-momentum of the QQ[n] state and 2q the relative four-momentum of the two constituent heavy quarks. We assume that the mass of the QQ[n] state is twice the heavy-quark mass m Q , p 2 0 = 4m 2 Q , while we take the other partons to be massless. The amplitudes |ab → QQ[n] + X are evaluated from the usual QCD amplitudes with amputated Q and Q spinors |A as where 2T e are color projectors with e being the color index of the cc color octet state. Π 0 and Π α 1 are the spin projectors [9],

Soft limits
Let us consider a generic Born amplitude, Then, the expression for the scattering amplitude with an additional gluon j with momentum p j attached to the line of an outgoing parton i is in the limit of p j being soft given by the eikonal approximation, Here, g s is the QCD gauge coupling, and T i acts on |A by inserting at the appropriate place T c if parton i is an outgoing quark or incoming antiquark, −T c if parton i is an incoming antiquark or outgoing quark, and if abc if parton i is a gluon.
Let us now consider a generic real-correction amplitude in the limit where a certain gluon j with momentum p j is soft, with S 1 (n; p j ) = g 2 The S 2 and S 3 terms, which only appear in squared amplitudes of P -wave states, are specific for our study.

Collinear limits
Let us first consider the limit where an incoming parton with momentum p i is collinear to the outgoing parton with momentum p j . In this limit the divergent contributions stem from the diagrams with i → (ij) + j splitting, If we define p ⊥ to be the transverse momentum of p (ij) in the rest frame of the incoming partons, then we can define the fraction x of the longitudinal momentum of i taken away by (ij) as Then, the squared matrix element factorizes in the collinear limit as where the indices s and s ′ or µ and ν are the spin or polarization indices of particle i, andP i,(ij) (x) are the spin-dependent D-dimensional Dokshitzer-Gribov-Lipatov-Altarelli-Parisi (DGLAP) [10] splitting functions, which, up to order O(ǫ), are given bŷ Here, (ij) and j are labeled q if the corresponding particle is a quark or antiquark and g if it is a gluon. We insert unity noticing that T (ij) = − 4 k=0 k =i,j T k and so obtain where we note that T 2 (ij) = C F if (ij) is a quark or antiquark, and C A if (ij) is a gluon. If the two outgoing partons 3 and 4 are collinear, the dominant contributions stem from diagrams where there is a (34) → 3 + 4 splitting, If we define p ⊥ to be the transverse momentum of p 3 in the rest frame of the incoming partons, then we can define the fraction z of the longitudinal momentum of (34) taken away by 3 as The squared matrix element then factorizes as where the indices s and s ′ or µ and ν withinP (34), 3 are the open spin or Lorentz indices of particle (34) in the Born amplitude.

Cancellation of IR divergences
The IR divergences associated with the soft and collinear limits discussed in sections 2.2 and 2.3 are to a large extent canceled by contributions of the virtual corrections, as shown in Fig. 1. A part of the initial-state collinear divergences is, however, absorbed in the PDFs, while the S 3 contributions to the soft divergences are canceled by LDME renormalization contributions. These two additional ingredients are described in this section.

PDF redefinition and PDF evolution
As for the initial-state collinear divergences of an i → (ij) + j splitting, a part of it is absorbed by an MS redefinition of the PDF f (ij)/A (y), which then becomes dependent on the factorization scale µ f , where µ r is the renormalization scale and P + i,(ij) (x) is one of the regularized DGLAP splitting functions, with n f being the number of active quark flavors, for us 3 for charmonium production and 4 for bottomonium production. Next, we solve Eq. (49) for f (ij)/A (y). Using these f (ij)/A (y) functions in the general formula (1), a mass factorization counterterm, arises, where parton (ij) carries the fraction x of the splitting parton's momentum. In Fig. 1, this contribution is indicated as the box labeled PDF redefinition. The DGLAP equations governing the evolution of the scale-dependent PDFs follow seamlessly after differentiating Eq. (49) with respect to µ f .

LDME renormalization and LDME evolution
In a similar fashion, we have to treat the contributions from the LDME renormalization. A NLO calculation of the 3 S [8] 1 LDME using NRQCD Feynman rules and an expansion in 1 m Q after the loop integration yields that it is related to the 3 P

[1]
J and 3 P This bare operator is both ultraviolet and IR divergent. We remove the ultraviolet singularity by introducing an MS-renormalized LDME, which depends on the NRQCD factorization scale µ Λ . Solving Eqs. (55) and (56) for 1 ] LO and using this in the general formula (1), we obtain the contribution Similarly, we obtain These contributions cancel the S 3 contributions of the soft divergences and are labeled LDME renormalization in Fig. 1. However, we transform them further to cast them into a form that will be more useful for our purposes. Noticing that we can rewrite Eqs. (57) and (58) as Recalling our convention regarding the summation of the polarization degrees of freedom, we observe that N pol so that we can write with From the terms in Eqs. (59) and (60), we obtain a corresponding expression, with Finally, we derive the formula for the running of the LDME O H [ 3 S [8] 1 ] (µ Λ ) with µ Λ . Differentiating Eq. (56) with respect to µ Λ , we obtain a renormalization group equation, with the solution 4 Dipole subtraction for quarkonium production

General setup
In a preliminary version, not yet taking into account kinematic cuts, we write the partonic NLO corrections as Here, dPS 2 is the two-particle phase space element, and dPS 3 is the three-particle phase space element, which factorizes in some way, as either dPS 3 = dPS 2 dPS dipole or dPS 3 = dPS 2 dx dPS dipole , where dPS dipole are certain dipole phase space elements and dx matches its counterpart within dσ MFC as defined in Eq. (54). The subtraction terms dσ subtr are defined in terms of some kinematic variables in the parameterization of dPS dipole and certain 2 → 2 kinematics momenta {p i } appearing in dPS 2 , which are in turn in some way mapped onto the 2 → 3 kinematics momenta {p i }. The idea is that dσ subtr matches dσ real in all singular limits. Therefore, the first bracket on the right-hand side of Eq. (72) is free of divergences and can be integrated numerically in four dimensions. On the other hand, dσ subtr is simple enough that it can be analytically integrated in D dimensions over dPS dipole . The IR poles of dσ dipole then become explicit as ǫ −1 and ǫ −2 poles and cancel the singularities of dσ virtual + dσ MFC + dσ op. ren. , so that the second bracket on the right-hand side of Eq. (72) is also finite and can be integrated numerically over dPS 2 or dPS 2 dx in four dimensions, too. So the task is to construct appropriate expressions for dσ subtr and dPS dipole with the corresponding momentum mappings.

Subtraction term
From Eqs. (31), (43), and (48), we observe that the sum of all softly and collinearly divergent terms can be brought into a form that can be approximated in all singular p i p k Definition and Integration Applied Mapping Here, Eq. (75) and section 4.
Here, Eq. (76) and section 4.5.3 MapPW5.2(p j ) Table 1: List of occurring V terms with given momentum assignments; of where their definitions and analytic expressions upon integration over the dipole phase spaces may be found in the Catani-Seymour (CS) [6] and Phaf-Weinzierl (PW) [7] papers and here; and of momentum mappings, according to the naming scheme of section 4.3, to be applied to the numerical integrations of the respective dipole terms over dPS 3 .
limits by the subtraction term with |abn, In the respective limits, the initial-state collinear singularities are reproduced by the first line, the final-state collinear singularities by the second line, the S 1 soft divergences by the corresponding soft limits of the first and second lines together, and the S 2 and S 3 soft divergences in the case of P -wave states by the last two lines. In the regions away from the soft and collinear limits, there are no additional singularities. We call each of the terms in the sums a dipole. In the Born amplitudes, particles i and j are replaced by one particle (ij), which is a gluon, a light quark or the QQ[n] state depending on the collinear or soft limits to be approximated. Where there is no divergent collinear or soft limit to be approximated, the contribution is just zero. We note that, in the soft limits, particles i and (ij) are the same and that, in the soft and final-state collinear limits, x = 1. We further define m( 3 P . Table 1 lists where to find the explicit expressions for V ini,S 1 ij,k and V fin,S 1 ij,k in the Catani-Seymour [6] and Phaf-Weinzierl [7] papers. The factors in Eq. (74) are adjusted so that V ini,S 1 ij,k equals V ij k or V ij,k and V fin,S 1 ij,k equals V ij,k or V k ij in their notations. The particle (ij) is called an emitter, the particle k a spectator, and the indices s and s ′ or µ and ν within V ij,k are the spin or polarization indices of particle (ij) in the Born amplitude. The V β S 2 ,ij and V αβ S 3 ,j terms, which are new, are given by so as to approximate Eqs. (32) and (33). A pictorial summary of all dipole terms appearing in our study is given in Fig. 2.
For all dipoles that do not involve the quarkonium momentum p 0 , we use the mapping that follows from Catani-Seymour chapters 5.2 and 5.3, and also 5.6 with n = p 3 + p 4 . With p a being an initial-state momentum, this mapping implies that It satisfies the conditions for all the limits p 3 or p 4 soft, p 3 collinear to p 4 , and p a collinear to either p 3 or p 4 , and we refer to this mapping as MapCS(p a ).
For those dipoles that involve the quarkonium momentum p 0 , an initial-state momentum p a and a massless final-state momentum p f , we use the mapping of Phaf-Weinzierl chapters 6.1 and 6.2. It satisfies the conditions for the limits p f soft and p f collinear to p a , and we refer to it as MapPW6(p a , p f ).
If we have a dipole term involving the quarkonium momentum p 0 plus two final-state momenta p f and p g , being p 3 and p 4 or vice versa, but we are only concerned with the limit p f soft, we use the mapping of Phaf-Weinzierl chapter 5.2, namelỹ which we call MapPW5.2(p f ).
The case involving the final-state momenta p 0 , p f , and p g , but with p 0 being the spectator, is more complicated, since here, in addition to the condition for p f soft, also those for p g soft and for p f and p g collinear need to be fulfilled. The momentum mapping appropriate here is the one of Phaf-Weinzierl chapter 5.1 is .
We note that the dipole terms in the Catani-Seymour and Phaf-Weinzierl papers were constructed such that the spin correlation terms of the splitting gluons vanish when contracted with the splitting gluon's tilde momentum. This property is used in the analytic integrations as a simplification, but it assumes that the momentum mapping of the corresponding chapter is used. A momentum mapping alternative to Eq. (81) is given, for example, in Eq. (5.9) of Ref. [8], which has the advantage of being symmetric in p f and p g . But that mapping does not fulfill the contraction property of the dipole terms in Phaf-Weinzierl chapter 5.1, which we use.

Phase space factorization
The phase space factorization dPS 3 = dPS 2 [dx]dPS dipole , with dPS dipole depending only on the external momenta involved in the respective dipole terms, is crucial to facilitate their analytic integrations over dPS dipole . In the case of dipoles for final-state particles only, we have dPS 3 = dPS 2 dPS dipole , and, in the case of dipoles involving an initial-state parton with momentum p a , the factorization is dPS 3 = dPS 2 dx dPS dipole , where x fulfills p a = xp a . The dipole factorization and the analytic integration can be found in the respective papers where the dipoles are given. The result of the analytic integration then only depends on the momenta {p i } and x. For the reader's convenience, we copy here the phase space parameterization of Phaf-Weinzierl chapters 5 and 6, only slightly adjusting the notation, since they will be the basis for our analytic integration of the V β S 2 ,ij and V αβ S 3 ,j terms in sections 4.5.1-4.5.3. The phase space parameterization used in Phaf-Weinzierl chapter 5, involving the quarkonium momentum p 0 and two final-state momenta p f and p g being p 3 and p 4 or vice versa, is with whereũ 0 , u, and v are those of Eq. (83) ands = (p 0 +p 3 ) 2 , which here equals (p 0 +p 3 +p 4 ) 2 , so thatũ 0 = (s − 4m 2 Q )/s. The phase space parameterization used in Phaf-Weinzierl chapter 6, involving the quarkonium momentum p 0 , an initial-state momentum p a being p 1 or p 2 and a final-state momentum p f being p 3 or p 4 , is with where x is that of Eq. (79), 4.5 Integration of dipoles over dipole phase space 4.5.1 Integration of V S2,ij terms: Initial-state case To solve the dipole phase space integral of V β S 2 ,ij given in Eq. (75) for an initial-state parton i, we use in the following the momentum mapping in Eq. (79) and the parameterization of the dipole phase space in Eq. (87) with p a = p i and p f = p j in both equations. Since the integration result can only depend on the momentap i andp 0 , we start by decomposing Although the component proportional top β 0 will vanish upon contraction with ǫ β (m l ) in Eq. (74), we still have to consider it here, since the integral itself does have this component. We determine C 1 by multiplying Eq. (89) withp iβ andp 0β and solving the resulting system of linear equations and so obtain Next, we apply the mapping in Eq. (79), express all appearing scalar products in terms ofψ i ,χ i , x, 1 − x, w, and 1 −χ i x, and so obtain (91) We now use the expression in Eq. (87) for the dipole phase space in Eq. (89), perform the w integration, and expand the result in ǫ using The result through terms of order O(ǫ 0 ) is then

Integration of V S2,ij terms: Final-state case
To solve the dipole phase space integral of V β S 2 ,ij given in Eq. (75) for a final-state parton i, we use in the following the momentum mapping in Eq. (80) and the parameterization of the dipole phase space in Eq. (85) with p f = p j and p g = p i in both equations. The integration result can then only depend on the momentap 3 andp 0 , and we decompose Although the component proportional top β 0 will vanish upon contraction with ǫ β (m l ) in Eq. (74), we still have to consider it here, since the integral itself does have this component. We determine C 3 by multiplying Eq. (94) in turn withp 3β andp 0β and solving the resulting system of linear equations and so obtain Next, we apply the mapping in Eq. (80), express all appearing scalar products in terms ofs,ũ 0 , u, 1 − u, v, and 1 −ũ 0 u, and so obtain Using the expression in Eq. (85) for the dipole phase space in Eq. (94), we can now do the integrations by identifying hypergeometric functions, which we then expand in ǫ using the program package HypExp [11]. Our result through order O(ǫ 0 ) is

Integration of V S3,j terms and incorporation of LDME renormalization counterterms
To solve the dipole phase space integral of V αβ S 3 ,j , we again use the momentum mapping in Eq. (80) and the parameterization of the dipole phase space in Eq. (85) with p f = p j and p g = p i . Since the integration result can only depend on the momentap 3 andp 0 , we decompose Although the components proportional top α 0 andp β 0 will vanish upon contraction with ǫ * α (m l )ǫ β (m l ) in Eq. (74), we still have to consider them here, since the integral itself does have these components. We determine C 5 and C 6 by multiplying Eq. (98) with g αβ , p 3αp3β ,p 0αp0β , andp 0αp3β and solving the resulting system of linear equations and so obtain Next, we apply the mapping in Eq. (80) and express all appearing scalar products in terms ofs,ũ 0 , u, 1 − u, v, and 1 −ũ 0 u and so obtain Using the expression in Eq. (85) for the dipole phase space in Eq. (98), we can now do the integrations by identifying hypergeometric functions, which we then expand in ǫ using HypExp [11]. Our result through order O(ǫ 0 ) is Let us now consider this result together with Eqs. (64)- (67) and (73). For each partonic 2 → 3 subprocess a + b → cc[n] + X, there is one (are two) contributions of V αβ S 3 ,j if X contains one (two) outgoing gluons and n is a P -wave state. The divergence of each of these contributions equals − |n, op.ren. 2 with the same partons a and b. Noticing that F sym (X) in the dipole subtraction term is 1 ( 1 2 ) if there is one (are two) outgoing gluon(s), but always 1 in the LDME renormalization contribution, we observe that the divergence in Eq. (103) is exactly canceled by the contributions from LDME renormalization. Thus, in our implementation, it is simplest to include the effects of the LDME renormalization by just using instead of Eq. (103) the expression which is then finite.

4.5.4
Integration of V S1,ini ij,k and V S1,fin ij,k

terms and incorporation of mass factorization counterterm
There is one subtlety related to the dipole terms of V S 1 ,ini ij,k in the initial-state collinear limits p i → xp i . In the second bracket of Eq. (72), there is then an apparent mismatch because dσ subtr involves parton i with momentum p i , while dσ virtual and dσ MFC involve initial-state parton (ij) with momentum xp i instead. Thus, special care has to be exercised regarding the differing color and polarization averaging and flux factors. In order to facilitate the singularity cancellation, it is, therefore, convenient to rewrite the contribution of the V S 1 ,ini ij,k terms in dσ subtr when appearing in the second bracket of Eq. (72) as F sym (X)n col (i)n pol (i) n col (a)n pol (a)n col (b)n pol (b)n col ((ij))n pol ((ij)) n, Born|V ini,S 1 ij,k with the terms = −1 and noticing that the effect of double contributions due to j = 3, 4 is balanced by the symmetry factor F sym (X) = 1 2 for two gluons in the final state, we observe that we can incorporate the effect of the mass factorization counterterm completely by using the expressions in Appendix A.
5 Implementation and numerical tests

Implementation of phase space cuts
The master formula (72) describes a total cross section. The observables we aim to calculate are, however, cross sections with specific kinematic cuts, for example, on the transverse momentum p T or the rapidity y of the quarkonium. To this end, we definẽ where p A and p B are the momenta of the incoming hadrons. For all momentum mappings, we then havep T → p T andỹ → y in all singular limits. We can thus refine Eq. (72) to include the kinematic constraints. For example, the cross section with a phase space cut p T > p T,min is calculated as In the first line of Eq. (111), we integrate over the complete three-particle phase space and implement the θ functions explicitly. The θ functions then cut out different regions of the three-particle phase space, depending on the momentum mappings used in each dipole term. This worsens the convergence of the numerical Monte-Carlo integration, but the θ functions coincide close to all singular regions, so that the cancellations of the divergent terms take place. We note that the strong-coupling constant in our implementation is usually evaluated at a renormalization scale that is chosen to depend on kinematic variables of the produced quarkonium, e.g., α s (p 2 T ). We then have to substitute α s (p 2 T ) in dσ subtr . As for the contributions in the third line of Eq. (111), the analytic integration of the  1 ] + gg (dDg). The coding is as in Fig. 2. subtraction term over the dipole phase space dPS dipole is not affected by the additionally imposed phase space cuts, sincep T only depends on the momenta {p i }.
Equation (111) allows for the evaluation of binned cross section distributions, e.g., in p T and/or y, which can be directly compared with experimental data. Refining the binning of such histograms yields approximations to smooth cross section distributions. To evaluate the latter exactly, however, one needs to replace the θ functions in Eq. (111) by δ functions, which renders the implementation of the cancellation of divergences quite nontrivial. We leave the elaboration of this for future work.

Numerical tests
We now numerically verify the implementation of the individual unintegrated dipole terms. The subtraction term must match all the real-correction squared matrix elements in the respective limits. Three dipoles are always needed to reproduce a collinear limit, many dipoles to reproduce a soft limit. As an illustration, we generate certain phase space points close to the singularities and evaluate there both the real-correction squared matrix elements and the corresponding dipole terms. Our results are presented in Tables 2 and  3. From there we observe that the squared matrix elements of the real corrections are indeed nicely matched by the corresponding subtraction terms constructed as described above for all the partonic subprocesses, Fock states, and kinematic limits considered.
To obtain meaningful numerical checks of the implementation of the integrated dipole terms, also beyond self-consistency, it is indispensable to compare with results obtained using phase space slicing. This is even more so the case for checks of the implementation of dipole subtraction in calculations of physical observables of quarkonium production. Extensive such tests have successfully been performed, for all our integrated dipole terms  1 ]+dg (gd2cCdg), and dd → cc[ 3 P [1] 2 ; 1 P [8] 1 ] + gg (dD2cCgg) in the limits where p 3 and p 4 are soft. The contributions of the dipoles involving V S 1 , V S 2 , and V S 3 are shown separately. and several phenomenological applications. Presenting them in detail would require to explain the anatomy of the implementation of phase space slicing in NLO NRQCD calculations, which reaches beyond the scope of this paper. We will report on such comparisons in a separate communication [12], in which we will also quantitatively describe how dipole subtraction outperforms phase space slicing with respect to numerical precision and computing time.

Summary
We devised an implementation of a subtraction scheme appropriate for studies of inclusive quarkonium production at NLO in the NRQCD factorization approach, based on the dipole subtraction scheme of Refs. [6,7]. We needed to take special care of the specific structure of the bound-state amplitudes and to include additional subtraction terms in the case of P -wave states. Our implementation passes all intrinsic tests and yields results consistent with our previous phase space slicing implementation, which it outruns both in terms of accuracy and speed.