Fast parallel calculation of modified Bessel function of the second kind and its derivatives

There are three main types of numerical computations for the Bessel function of the second kind: series expansion, continued fraction, and asymptotic expansion. In addition, they are combined in the appropriate domain for each. However, there are some regions where the combination of these types requires sufficient computation time to achieve sufficient accuracy, however, efficiency is significantly reduced when parallelized. In the proposed method, we adopt a simple numerical integration concept of integral representation. We coarsely refine the integration range beforehand, and stabilize the computation time by performing the integration calculation at a fixed number of intervals. Experiments demonstrate that the proposed method can achieve the same level of accuracy as existing methods in less than half the computation time.


Introduction
The modified Bessel function of the second kind K v (x) is an important special function adopted in various fields [1]. In addition to mathematics and physics, it has become increasingly important in the fields of statistics and economics. For example, normal inverse Gaussian [2] has been garnering increasing attention in economics [3], biometrics [4,5], and machine learning [6,7].
Three numerical methods exist calculating the Bessel function K v : series expansion-based [8], continued fraction-based [8,9,10,11], and asymptotic expansion-based methods [12,13]. Because each computation method exhibits different accuracies and computation time characteristics for v and x, a combined implementation of these methods is generally adopted [14,15,16]. However, in some regions, an enormous amount of computation time is required for any method to achieve sufficient accuracy.
Recently, it has become common to adopt GPUs and multi-core processors to perform large-scale paralell or vectorized processing, and it is desirable to support parallel processing for the computation of special functions. When computing function values for several orders and variable combinations of variables simultaneously, computation time depends on the slowest computation combination. Therefore, if computation is slow for a particular combination, the overall performance will decrease significantly.
In this study, we propose an integration-based method for computing K v (x), which is suitable for parallelization. Computing directly from the definition of the integral representation is simple and powerful [17]. However, the method that involves determining the bin size and sustaining addition until convergence cannot handle a wide range of orders v and arguments x. In particular, if the bin size is fixed, the number of additions until convergence varies significantly for each argument, making the method unsuitable for parallelization.
Therefore, in the proposed method, the range to be integrated is calculated beforehand, and the number of bins is fixed, instead of the width of the bin. Even if the evaluation of the integration range is coarse, it does not significantly affect the accuracy, and the integration can be performed by adopting the parallelization feature.
The derivative of an arbitrary Bessel function can also be computed similarly. Because there are no major libraries for the differentiation of Bessel functions of any order, it would be beneficial to provide the implementations of these functions.
We implemented the proposed method using Python and TensorFlow [18]. The obtained results indicate that the proposed method outperforms conventional methods in terms of both accuracy and speed.

Proposed methods
For v ∈ R, x > 0, modified Bessel functions of the second kind can be represented by the evaluation of integrals of the form [1]: where Considering the shape of f v,x (t), we examine the increase or decrease in its logarithmic form The derivatives of g v,x (t) relative to t are given by In particular, g v,x (0) = −x, g v,x (0) = 0 and g v,x (t) ≤ 0 always hold. Therefore, in the case of v 2 ≤ x, g v,x (t) monotonically decreases because g v,x (0) ≤ 0 (Fig. 1ab). However, in the case of v 2 > x, g v,x (t) has only one peak t p ≥ 0, because g v,x (0) > 0 (Fig. 1cd).
When integrating f v,x (t) numerically, we sololy need to consider the region where g v,x (t) ≥ g v,x (t p ), in which g v,x (t) takes the maximum value at t p and denotes the machine epsilon. In fact, from the shape of g v,x (t), the region that satisfies the condition can be defined as a single continuous range as:

Find peak t p
To determine the integral range, we first find the maximum value. In the case of v 2 ≤ x, t p = 0 is obvious from the shape of g v,x (t). In the case of v 2 > x, g v,x (t) exibits the maximum at t p > 0.
To search for t p > 0, first, the range where the zero point t p of g v,x (t p ) = 0 exists is determined using the property that g v,x > 0 for t < t p and g v, The details of this procedure (FindRange) are provided in Algorithm 1.
Next, for the obtained existence range [2 m−1 , 2 m ], find t p , such that f (t p ) = 0, using a combination of the binary search and Newton methods. The details of this procedure (FindZero) are provided in Algorithm 2.

Find the integration range
For t 1 , after applying FindRange to determine the range [t p + 2 m−1 , t p + 2 m ], apply FindZero within the range to obtain t 1 , such that g v,
In fact, for numerical stability, log sum exp was applied to g v,x (t m ):

Implementation
The entire proposed algorithm based on integration, which is hereafter denoted as "I," is summarized in Algorithm 3. Our implementations using Python and TensorFlow are available at https://github.com/tk2lab/ logbesselk. In section 3, the proposed method is compared with implementations by series expansion ("S;" see Appendix A), continued fraction ("C;" see Appendix B), and asymptotic expansion ("A;" see Appendix C). The implementations adopted for the comparison are also available. These implementations used for the comparison are also publicly available at the same already provided url.

Accuracy
We first evaluate the accuracy of the methods based on series expansion (S), continued fraction (C), asymptotic expansion (A), and integration (I). In the region near v = x, the accuracy of these methods deteriorates [19]. It can be deduced that series expansion is accurate for x < 5, continued fraction is accurate for x > 1, and asymptotic expansion is accurate in the range v > 10 or x > 10 (Fig. 2). However, there is no region where integration is inaccurate.

Computational time
Next, we examined the computation time for each order v and argument x separately. The computation time was obtained by excluding regions with errors of 4 or more. For I, the computation time tends to be generally uniform; however, for S, C, and A, there are regions in the domain where the computation time becomes large (Fig. 3). Although TensorFlow can compute multiple pairs of v and x simultaneously, the computation time in this case is the worst computation time for the included v and x. Therefore, the worst-case computation time within the range assumed in the application is required to be small, instead of the average computation time.

Combinations
For each value of v and x, the method with the highest accuracy among S, C, and A was identified. The results demonstrated that S and C were separated by the value of x. In contrast, A and S+C are distributed in  Figure 4: a) Among S, C, and A, the region with the best accuracy for S is depicted in red, the region with the best accuracy for C is in blue, and the region with the best accuracy for A is in green. b) Among S, C, and A, the method with the smallest computation time is shown for the condition that error is less than 1. c, d) Evaluation of theoretical error (c) and computation time (d) by combining S, C, and A.
the same region were sparsely and are considered to be equal in terms of accuracy (Fig. 4a). The criterion for separating S and C was set to x = 1.6 + 0.5 log(x + 1). We also examined the method with the shortest computation time under the condition that the error is less than one. Accordingly, the regions of S, C, and A were clearly separated (Fig. 4b). The criterion for separating S+C and A was v = 25.
When the regions were defined in such a way that the accuracy was high and the computation time was short, sufficient results were obtained for the accuracy (Fig. 4c); however, for the computation time, there were regions near the boundary in which the computation was slow (Fig. 4d).

Advantage of the proposed method
We compared the proposed method with combinations of S, C, and A. Here, in addition to our own implementation of the combination, we also compared the proposed methods with TensorFlowProbability (tfp), an existing implementation of the same combination.
No significant difference in accuracy existed for a wide range of v and x. The distribution of errors was not substantially different among methods (Fig. 5a). In addition, the distribution of the errors was not significantly different among the methods, and the mean value was the smallest for the proposed method (Fig. 5b).
Regarding execution time, the proposed method exhibited a higher performance in a wide range (Fig. 5c). The computation times were stable at approximately 5 ms, which indicates that it outperforms the other methods  ( Fig. 5d). Next, we performed vectorized computations on arrays of various sizes and measured the computation time. Orders v and arguments x were generated in the range [0, 99] as 10 2r − 1, and [0.1, 100] as 10 3r−1 , respectively, where r denotes a uniform random number. In this case, the proposed method could compute from an array size of 10 1 up to 10 4 , with almost no change in computation time (Fig. 6). The increase in computation time above 10 4 can be attributed to the fact that the memory limit was reached.
For the SCA and tfp implementations, the computation time increases in the region up to 10 2 , and then becomes almost constant (Fig. 6). This can be expected to depend on the probability of including combinations with slow computation speed. As aforementioned, the execution time of a vectorized computation corresponds to the worst-case computational complexity of the included computations; and the computation times of SCA and tfp vary for the combination of v and x (Fig. 5cd).

Derivatives
The derivative of the Bessel function K v (x) with respect to the argument x is given by The derivatives to x of SCA and tfp are calculated using Eq. 13. However, the derivative with respect to order v is not provided by most libraries, and I type there is no known approach to obtain it indirectly.
The derivatives of the Bessel function K v (x) with respect to the order v and the argument x are obtained by differentiating f v,x (t) and integrating with t. The derivatives of f v,x (t) are given by v,x (t) = n log t + m log cosh(t) + log cosh(vt) n is even n log t + m log cosh(t) + log sinh(vt) n is odd , and these also have the single range that should be integrated as the f v,x (t) case. Therefore, the derivatives of K v (x) can be obtained similarly to those of K v (x). The derivative with respect to x can be calculated with an error less than one, as K v (x) (Fig. 7a). For the derivative with respect to v, the mean of the error is greater than one, which indicates that the accuracy is low. This could be attributed to the presence of the log term in the function to be integrated, and it is believed that the behavior around zero is incompletely captured. Nevertheless, the relative error is approximately 10 1 4, which is sufficiently useful, considering that it can be calculated fast.

Low precision floating point
Experiments were also conducted on 32-bit low-precision floating-point systems, and results of less than 1 were obtained in most regions. The computation time was almost half that of the 64-bit case, and the overall trend was also consistent.

Conclusion
We have proposed a computational method suitable for the parallel computation of Bessel functions. The proposed method is a simple method that adds a preprocessing step to efficiently refine the integration range to the approach of computing the integral representation. Accordingly, the proposed method enables fast computation with sufficient accuracy.
This method is also applicable to the differentiation of Bessel functions, and is effective in various fields such as mathematics, physics, and statistics. In the future, it will be beneficial to investigate faster methods for the parallel processing of several related special functions.

Appendix C. Asymptotic expansion
The expansion in terms of the harmonic functions for K v (x) is given by [12] K (C. 2) The coefficients c i,j can be evaluated using a recurrence relationship: