Efficient numerical evaluation of Feynman integrals

Feynman loop integrals are a key ingredient for the calculation of higher order radiation effects, and are responsible for reliable and accurate theoretical prediction. We improve the efficiency of numerical integration in sector decomposition by implementing a quasi-Monte Carlo method associated with the CUDA/GPU technique. For demonstration we present the results of several Feynman integrals up to two loops in both Euclidean and physical kinematic regions in comparison with those obtained from FIESTA3. It is shown that both planar and non-planar two-loop master integrals in the physical kinematic region can be evaluated in less than half a minute with accuracy, which makes the direct numerical approach viable for precise investigation of higher order effects in multi-loop processes, e.g. the next-to-leading order QCD effect in Higgs pair production via gluon fusion with a finite top quark mass.


Introduction
The scalar particle predicted by the Standard Model, the Higgs boson, has been finally discovered at the CERN Large Hadron Collider (LHC) [1,2]. This milestone has inspired various exciting investigations of the further details of the Higgs boson and related research. Recently the launch of LHC RunII at 13 TeV collision energy brings physics exploration to a new era. With the highest ever center-of-mass energy and luminosity, many scattering processes that potentially answer some fundamental physics questions will be able to reach an accuracy in the percent range or better, while the appropriate analysis on high precision data demands that the uncertainties of theoretical prediction reach the same accuracy. In order to achieve this, higher order QCD effects must be included in theoretical predictions. For instance, state-of-the-art investigations on Higgs inclusive produc-tion have explored next-to-next-to-next-to-leading order (NNNLO) effects for the quark pair annihilation initial state [3] and for the gluon fusion initial state [4]. Meanwhile, next-to-next-to-leading order (NNLO) theoretical predictions have been provided for Higgs pair production [6] and the associated production of Higgs with jet [6][7][8] or vector boson [9][10][11][12]. As the accumulated luminosity at the LHC increases, the investigation of higher order QCD effects will be wanted for more processes, e.g. top quark production and jet cross sections. In the higher order effects, one of the most important ingredients is virtual correction, which always relies on evaluation of Feynman loop integrals.
After decades of effort, various algorithms have been proposed for evaluating Feynman loop integrals, including both analytical and numerical approaches. The analytical approaches can provide explicit expressions for Feynman integrals, and can further reveal significant physical characteristics. However, when complicated processes are encountered, it becomes difficult to obtain analytical solutions, while the numerical approaches can solve more challenging problems in spite of a heavy burden of evaluation time. Sector decomposition, one of the numerical algorithms, was introduced as a systematic approach by Binoth and Heinrich [13,14]. Following a certain choice of decomposition strategies, this algorithm divides the domain of loop integration into sectors. In each individual sector, proper transformation of integration variables is performed to explicitly reveal the ultraviolet (UV) and infrared (IR) singularities. Ultimately the coefficients of a Laurent series in of the Feynman integral can be evaluated numerically. Initially, sector decomposition was implemented for the Feynman integral in the Euclidean kinematic region [13][14][15], where the Cauchy singular integral can be avoided. Later, inspired by Nagy and Soper [16,17], integration contour deformation was proposed [18] as a systematic scheme to extend sector decomposition to the physical kinematic region.
Several programs [19][20][21] have implemented the sector decomposition algorithm for the numerical evaluation of Feynman loop integrals. Normally they use Monte Carlo (MC) integration methods, which have been widely used in high energy physics research. For instance, Vegas [22], an adapative Monte Carlo method, can achieve an integration convergence rate of O(1/ √ n). In this paper, we implement the quasi-Monte Carlo (QMC) method for the numerical evaluation of the integrals in sector decomposition. As a better choice, QMC can have a convergence rate close to O(n −1 ) for differentiable integrands. Furthermore, we adopt the technique of CUDA/GPU to improve the performance of numerical evaluation. This paper is organised as follows. In Section 2 we review the sector decomposition algorithm, then Section 3 gives a brief description of the QMC integration method. In Section 4 we compare the performance of our program with FIESTA [20]. Our conclusions are then presented in the final section.

Sector decomposition
Generically an L-loop Feynman integral has the following representation: where D = 4 − 2 ; q j is the momentum of relevant internal propagator, and is a linear combination of the loop momenta {k l } and external momenta; m j is the mass of the relevant internal propagator; and ν j is the power of the corresponding propagator. After Feynman parameterisation and integration over the loop momenta, one can obtain where N ν = N j=1 ν j , and U and F are polynomials of {x l } and can be straightforwardly derived from the momentum representation, or constructed from the topology of the corresponding Feynman graph [23].
Further treatment of the Feynman integral requires careful consideration since U and F can vanish when some of {x l } approach zero, which may be related to UV or IR divergence. Direct numerical integration is impossible for divergent integrals. A sector decomposition algorithm is designed to systematically extract the divergence, and is briefly described as follows [24].
Firstly, the integration domain is equally split into N sub-domains, which are called primary sectors: Then in the i-th sector we implement the variable transformation, Thereafter x i integration is performed associated with the step function, and now the Feynman integral can be expressed as where Obviously for any given primary sector I l , the domain of integration is an (N − 1)-dimensional unit cube.
Next, following an iterative decomposition strategy [19] or geometric strategy [25,26], each primary sector is finally divided into some subsectors {I la } so that in any subsector polynomials U l and F l can be factorised into the form

033103-2
after proper variable transformation. In each subsector, the new variables and Jacobian generated by the transformation are required to be monomials of original variables, and meanwhile the transformation projects the domain of integration to the (N −1)-dimensional unit cube. In the above expressions, b laj and b laj are non-negative integers. H la and H la are polynomials of {t j } such that H la (0, · · · , 0) = 0 and H la (0, · · · , 0) = 0. Now the primary sector becomes the combination of subsectors, where the powers of t j are collected into α laj +β laj , and D la contains the coefficients from the Jacobian and C la and C la .
In the practical evaluation of Feynman integrals, we will adopt the geometric strategy since it can be guaranteed to succeed and results in the smallest number of subsectors.
After sector decomposition, the singularities in the Feynman integral have been collected into the regulators in the form of t α+β , which can explicitly present the pole of the integral by using a Laurent series or integration by parts (IBP). Without loss of generality, we rewrite the integral with a certain regulator as where f (0, ) is non-zero and finite. Then if α −1, the above integral contains a singularity on the lower bound. By expanding f (t, ) into a Laurent series around t = 0, the singularity can be explicitly extracted as where Thereafter, the integrands can be expanded by small , and the coefficients of the Laurent series in can be evaluated numerically order by order. However, numerical evaluation of r(t) may suffer numerical instability from large number cancellation. An alternative approach to the pole extraction that can avoid this problem is IBP, It can be seen that the power of t increases by one. Therefore, by repeating the above IBP formula enough times, the power of t will not generate singularity, and then the numerical approach can be implemented to evaluate the coefficients of the Laurent series in . However, occasionally the integral contains Cauchy singularities in the physical kinematic region. In this case, the sign of F cannot be guaranteed definite, so the Cauchy singular Feynman integral is only valid under a proper contour according to the conventional iε prescription of the Feynman propagators. Practically such an infinitesimal shifted contour will sabotage the stability of numerical integration. Fortunately an interesting prescription of contour deformation has been proposed [27] where an appropriate choice of λ i can guarantee the sign of Im(F ( z )) is always negative as required conventionally.

Quasi-Monte Carlo
After the implementation of the sector decomposition algorithm reviewed in Section 2, the Feynman integral is expressed as a Laurent series in . The coefficients of the series are composed of convergent integrals, which can be numerically evaluated. A one-dimensional integral can be easily evaluated by a numerical approach such as the trapezoidal rule, while numerical evaluation of multidimensionals integral is usually much more difficult.
For instance, an s-dimensional integral can be written as Given a predefined set of n points { x i | x i ∈ [0, 1] s ; i = 0, · · · , n − 1}, the above integral can be estimated by This method is called the quasi-Monte Carlo method, and the point set is called the quasi-Monte Carlo rule [28]. Conventionally two families of QMC rules attract most interest. One consists of digital nets and digital sequences, while the other is the lattice rule. In this paper we adopt the rank-1 lattice rule (R1LR) defined by [28] x i = i z n , i = 0, · · · , n − 1, 033103-3 where z, known as the generating vector, is an sdimensional integer vector. The integer components of z are all relatively prime to n. The braces around the vector in Eq. (17) means only the fractional part of each component in the vector is taken.
The previous R1LR will result in a biased estimation, since it is fully deterministic. To achieve an unbiased result, we need to introduce appropriate randomisation. For R1LR, we can use the simplest form of randomisation, called shifting, which will yield so-called shifted lattice rule. The QMC algorithm utilising random shifted R1LR is explained as below [28].
2) For each shift, the integral estimation in Eq. (16) is repeated to obtain a set of m integral estimations, {Q n,s,1 (f ), · · · , Q n,s,m (f )}, where 3) Then the average of these m integral estimations is finally taken as an unbiased approximation of the integral I s (f ). 4) Furthermore, an unbiased estimation for the meansquare error ofQ n,s,m (f ) can be obtained by The above algorithm improves the practicability of the R1LR QMC method. Moreover, in the case that the integrand f is a 1-periodic function and all partial derivatives of f exist, the convergence rate can be improved to O(n −1 ) [29,30] if the generating vector is obtained by the component-by-component construction. However, in practice even when the integrand f is a nonperiodic function, one can implement the transformation x i = 3y 2 i − 2y 3 i to obtain a periodic integrand as below: where it can be seen that g( y) = 0 once y reaches boundary of [0, 1] s , as long as f is bounded at the edge. Beside the O(n −1 ) convergence rate, the shifted R1LR QMC method has some intrinsic advantages. In the shifted R1LR QMC method the lattice rule is deterministic and therefore the complexity of random number generation only depends on the number of shifts. By contrast, in the MC method, since the evaluation points are independent random vectors, a large amount of pseudo random number generation consumes much more of the GPU resources. Besides, since the QMC method is a non-adaptive method, it can easily deal with integrals in complex space, which is inevitable for Feynman integrals in the physical kinematic region. However, for adaptive algorithms, e.g. Vegas, it is difficult to define an appropriate rule to handle such integrals.

Numerical results
In this section, we present some numerical results for several Feynman integrals up to two loops for certain choices of kinematic parameter as a demonstration. In the Euclidean kinematic region we show the evaluation of massless scalar double box diagrams, and in the physical kinematic region we take several master integrals from the investigation of Higgs pair production via gluon fusion. The Higgs and top quark masses are chosen as M H = 125 GeV and M t = 172 GeV [31]. By comparing 1) with FIESTA3 [20] using the Vegas algorithm [22,32] as a benchmark of CPU efficiency, we illustrate the improvement in the efficiency of numerical evaluation of Feynman integrals. The sample codes for the Feynman integrals in this section were generated by MIRACLE, which is a general purpose package in preparation.

One-loop Feynman integral in physical kinematic region
The leading order (LO) contribution to Higgs pair production via gluon fusion contains the one-loop box Feynman diagram as shown in Fig. 1. For the evaluation of this diagram, the most complicated master integral is The initial states are on-shell gluons p 2 1 = p 2 2 = 0, and the final states are on-shell Higgs bosons p 2 3 = p 2 4 = M 2 H . The Mandelstam variables are defined as (p 1 + p 2 ) 2 = s and (p 2 − p 3 ) 2 = t. 2) 1) Our program was deployed with NVIDIA Tesla K20 GPU, while FIESTA3 used four cores of Intel Core i7 3770 CPU (3.4GHz).
2) For simplicity, in the following the dimension of scale is set as GeV by default. As shown in Fig. 2, we evaluate 1000 points between s = 70000 and s = 500000, while t = −6000 is fixed. The average time to evaluate one point is 83 ms, which is acceptable for practical calculation of one-loop Feynman integrals. We can find that the threshold effect is explicitly shown at s = 4M 2 t = 118336. Below the threshold Re(I A ) vanishes since at the moment the sign of the F -term in each decomposed subsector is definite. Meanwhile Im(I A ) peaks on the threshold, where we can see an obvious large relative error due to slow convergence of some integrals. Fig. 2 also presents the comparison with results from LoopTools [33]. It can be seen that the relative errors of our results are within 10 −3 , and the relative errors can be smaller than 10 −4 for most of the points.

Two-loop Feynman integral in Euclidean kinematic region
For the demonstration of a two-loop Feynman integral in the Euclidean kinematic region, we choose the Fig. 3. Massless double box diagram with two legs off-shell [14], where all the external momenta are incoming.
massless double box diagram with two legs off-shell [14], as shown in Fig. 3. Explicitly this Feynman integral is written as

033103-5
Because this Feynman integral contains divergences, the results are expressed in form of a Laurent series in , During the numerical evaluation, the Mandelstam variables are set as s = (p 1 +p 2 ) 2 = −2/3, t = (p 2 +p 3 ) 2 = −2/3, and u = (p 1 + p 3 ) 2 = −2/3. In Table 1, we compare our results (marked as QMC) with those from FIESTA3 [20]. It can be seen that all the results are consistent to O(10 −3 ), while our program is much (about 50-200 times) faster than FIESTA3. This implies that our program can provide the results of Feyn-man integrals with much higher accuracy in the Euclidean kinematic region. One of the reasons for the improvement is the QMC algorithm, which has been proved to have better convergence rate compared to conventional MC algorithm [29,30]. Another reason is the implementation of the GPU-accelerated computing. A GPU has a specified parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously. In contrast, a CPU consists of a few cores optimized for sequential serial processing. The combination of the QMC algorithm and GPU-accelerated computing provide an efficient way to evaluate the Feynman integrals.

Two-loop Feynman integral in physical kinematic region
The next-to-leading order (NLO) contribution to Higgs pair production via gluon fusion contains the twoloop double box Feynman diagram as shown in Fig. 4, which is one of the challenges for an analytical approach with a finite top quark mass. To evaluate this diagram, we will confront the complicated master integral This Feynman integral contains IR divergences, so we express the results in form of a Laurent series in , Here the initial states are on-shell gluons p 2 1 = p 2 2 = 0, and the final states are on-shell Higgs bosons p 2 3 = p 2 4 = M 2 H . The Mandelstam variables are set as s = (p 1 + p 2 ) 2 = 160000 and t = (p 2 − p 3 ) 2 = −6000.

033103-6
As shown in Table 2, after more than 12 hours, the relative error of the finite term from FIESTA3 can only reach 10 −2 . By comparison, in 19 seconds our program can obtain an accuracy of about 10 −3 , which may cost FIESTA3 more than a thousand hours. Moreover, compared with the efficiency improvement obtained in the Euclidean kinematic region in Section 4.2, we can find a much better improvement in the physical kinematic region due to the advantages of the QMC algorithm for complex integrals as explained in Section 3. Besides, it can be seen that the efficiency of our program for twoloop Feynman integrals makes the numerical approach viable for the evaluation of the NLO virtual contribution to Higgs pair production via gluon fusion with a finite top quark mass. Therefore an investigation of the finite top quark mass effect in Higgs pair production can be accomplished within months.

Non-planar two-loop Feynman integral in physical kinematic region
The non-planar two-loop diagram shown in Fig. 5 also contributes to Higgs pair production via gluon fusion at NLO. During the evaluation of this diagram, the most complicated master integral is which contains IR divergences, and can be expressed in form of a Laurent series in , The same as the configuration in Section 4.3, the initial states are on-shell gluons p 2 1 = p 2 2 = 0, and the final states are on-shell Higgs bosons p 2 3 = p 2 4 = M 2 H . The Mandelstam variables are set as s = (p 1 + p 2 ) 2 = 160000 and t = (p 2 − p 3 ) 2 = −6000.
In Table 3, we can see that our program can obtain a result with O(10 −3 ) accuracy in 20 seconds, while the relative error of the FIESTA3 result takes over 15 hours to reach near that order of accuracy. By comparing with the results in previous section, it is obvious that the non-planar double-box master integral has a slower convergence rate. This is consistent with the conventional conclusion that the evaluation of non-planar diagrams is more difficult than planar ones. Nonetheless, it can be seen that efficiency of our program for the evaluation of non-planar master integrals is acceptable for the practical numerical approach, which can provide NLO numerical results for Higgs pair production within months.

Conclusion
We have implemented the shifted R1LR QMC method associated with CUDA/GPU to numerically evaluate Feynman loop integrals by using a sector decomposition algorithm. Some examples are presented to show the promising efficiency of our program on the numerical evaluation of Feynman loop integrals. For a one-loop box Feynman integral, we can obtain an accu-racy of about 10 −4 in tens of milliseconds. For two-loop double box Feynman integrals, the accuracy can reach about 10 −3 in several seconds in the Euclidean kinematic region, while in the physical kinematic region less than half a minute is needed. The efficiency of our program can make the direct numerical approach viable for the precise investigation of some important processes, e.g. Higgs pair production via gluon fusion at NLO with the finite top quark mass effect.