Effective computations of module inverses with the Approximating k-ary GDD Algorithm by Ishmukhametov

Finite field calculations are used in modern cryptographic protocols for generating keys, encrypting and decrypting data, and building an electronic digital signature. The module inversing is necessary part of these calculations based on the extended Euclidean algorithm. Ishmukhametov developed a new algorithm for calculating the greatest common divisor of natural numbers called the approximat-ing algorithm which is a variant of the k-ary GCD Algorithm by J. Sorenson. In this paper we develop an extension version of this algorithm.

One of the drawbacks of the k-ary GCD Algorithm is that when it runs it accumulates so called extraneous factors, that is, new factors that are not part of the original GCD, so the final value of the Algorithm is a multiple of the required value. This implies a need in additional operations to calculate the common GCD of the initial numbers A and B together with output d of the k-ary Algorithm: ݀ = ‫,ܣ(ܦܥܩ‬ ‫,ܤ‬ ݀) In articles [11][12], the authors use various methods to solve this problem of the appearance of extraneous factors, accelerating the general procedure for calculating GCD. In [13], a new algorithm was proposed for computing inverse modulo elements based on an approximating algorithm. This article provides theoretical data on the convergence of the extended approximating algorithm and provides examples of the inverse operations modulo based on this algorithm. In our article we will consider in detail the construction of an effective algorithm for solving the Bezout equation, the calculation of inverse elements modulo a given number, and the programming of these algorithms. We start with a theoretical description of the algorithm for solving the equation Au+Bv=1 for given coprime numbers A and B. Solving the Bezout equation using the extended Euclidean algorithm As mentioned earlier, one of the important tasks solved using the extended version of the Euclidean algorithm is to solve the Bezout equation: ‫ݑܣ‬ + ‫ݒܤ‬ = ݀ (1) where A, B are given natural numbers, d is their greatest common divisor, and u, v are unknown coefficients called Bezout coefficients.
When Bezout' coefficients are known, one can find elements inverse in modulus. Indeed, let the numbers A and B be coprime, i.e. their GCD d is equal to 1, then taking the rest of both parts of (1) by modulo A, we obtain ‫ܤ‬ ିଵ ≡ ‫ݕ‬ ‫݀݉‬ ‫.ܣ‬ The solution of the Bezout equation according to the scheme of the extended Euclidean algorithm can be shown using a table of 7 columns. Consider this scheme as an example of a pair of numbers A = 185, B = 39 (table 1).  0  185  39  29  4  -4  19  1  39  29  10  1  3  -4  2  29  10  9  2  -1  3  3  10  9  1  1  1  -1  4  9 1 0 9 0 1 The first column contains iteration numbers. We put original numbers A and B in the first line and start an iteration procedure. At each iteration of the algorithm, it calculates the integer quotient of division ‫ܣ‬ by ‫ܤ‬ and the remainder ‫ܣ‬ ‫݀݉‬ ‫.ܤ‬ Then we move values of B and A mod B down -left, obtaining the source data for the next iteration.
The procedure is performing several times until zero appears in column ‫ܣ‬ ‫݀݉‬ ‫.ܤ‬ At position B in this line we get the GCD of the numbers A and B. Since A and B are coprime, in our case the GCD is equal 1. This completes the direct run of the Euclidean Algorithm.
Then we implement the second part of the Algorithm filling in last two columns. It begins by writing 0 and 1 in the last line and continues by computing other values ‫ݑ‬ and ‫ݒ‬ by formulas The value ‫ݒ‬ gives us the value of the inverse element ‫ܤ‬ ିଵ ‫݀݉‬ ‫.ܣ‬ The asymptotic complexity of this algorithm is the same as the usual Euclidean algorithm, that is, ܱ(log ‫.)ܤ‬ If we compare the execution time of the forward and backward parts of the algorithm, then the second part runs faster because the multiplication operation has linear complexity with respect to the length of the input numbers, while the division operation with the remainder has the complexity ‫ܮ(ܱ‬ log ‫)ܮ‬ with respect to the length of the input numbers.

Calculation of inverse modulo elements using the k-ary GCD Algorithm
We turn further to the k-ary Algorithm and consider formulas similar to formulas (2). Like as Euclidean's algorithm it runs by iterations.
Let a small positive integer ݇ be chosen. Usually, ݇ is taken equal to a power of 2, ݇ = 2 ௦ , this accelerates the whole time of GCD computation.
At each iteration the procedure receives two numbers ‫ܣ‬ and ‫ܤ‬ as input, finds two small numbers ‫ݔ‬ and ‫ݕ‬ such that the following equation holds ‫ݔܣ‬ + ‫ݕܤ‬ ≡ 0 ‫݀݉‬ ݇ then computes integer ‫ܥ‬ = ‫ݔܣ(‬ + ‫,݇/)ݕܤ‬ checks if ‫ܥ‬ is even, and divides it by 2 in the opposite case until it becomes odd. Then it forms a new pair ‫,ܤ(‬ ‫)ܥ‬ as the input of the next iteration, if ‫ܤ‬ > ‫,ܥ‬ or ‫,ܥ(‬ ‫,)ܤ‬ otherwise. It stops when ‫ܥ‬ becomes equal to 0. Consider the last iteration of this algorithm.
Suppose that at step ݊ the equality ‫ܥ‬ = 0 is reached. From this equality we directly obtain the equality ‫ܣ‬ ‫ݔ‬ + ‫ܤ‬ ‫ݕ‬ = 0,which gives us ‫ܤ‬ as the output value of the k-ary algorithm. Denote this value by ‫ܦ‬ . We define the values of the parameters ‫ݑ‬ and ‫ݒ‬ equal to 0 and 1, similarly to the formulas of the extended Euclidean algorithm. This gives equality ‫ܣ‬ ‫ݑ‬ + ‫ܤ‬ ‫ݒ‬ = ‫ܦ‬ and using induction we can assume that this formula holds for some 0 < ݅ ≤ ݊, i.e ‫ܣ‬ ‫ݑ‬ + ‫ܤ‬ ‫ݒ‬ = ‫ܦ‬ .

Speed estimation of the Extended k-ary Algorithm
Let us estimate the number of operations required to calculate the inverse element according to our scheme in comparison with the scheme of the extended Euclidean algorithm. Denote by ܰ и ܰ ா the number of iterations when calculating the GCD according to the scheme of the approximating algorithm and the Euclidean algorithm, respectively. According to [7] ܰ ா ≈ 5ܰ with k = 4096 and the length of the input numbers up to 3000 bits.
At iteration of the back run of the Euclidean Algorithm two operations are performed: ‫ݑ‬ = ‫ݒ‬ ାଵ , ‫ݒ‬ = ‫ݑ‬ ାଵ − ‫ݒ‬ ାଵ ⋅ ‫ܣ(ݐ݊݅‬ ‫ܤ/‬ ). The first operation is simple assignment, it has a linear complexity with respect to lengths ‫ܣ‬ и ‫ܤ‬ . Integer division ‫ܣ‬ ‫ܤ/‬ was performed in the main loop of the Euclidean scheme, here it was simply extracted from their previously saved array, therefore, at one iteration of the Euclidean scheme, one operation of two long numbers is performed, which has linear complexity ‫)ܮ(ܱ‬ regarding length of ‫ܣ‬ . Indeed, the attitude ‫ܣ(ݐ݊݅‬ ‫ܤ/‬ ) takes small values and usually fits in the size of one machine word, and the parameters ‫ݑ‬ и ‫ݒ‬ are bounded by initial numbers ‫ܣ‬ и ‫. ܤ‬ We now consider the basic operations on the iteration of the reverse run of the Approximating Algorithm. At each iteration, the calculation of the parameters ‫ݑ‬ и ‫ݒ‬ is performed according to formulas (5): ቄ ‫ݑ‬ = ‫ݖ‬ ‫ݔ‬ ‫ݒ‬ ାଵ , ‫ݒ‬ = ‫ݖ‬ ‫ݕ‬ ‫ݒ‬ ାଵ + ‫ݎ݇‬ ‫ݑ‬ ାଵ Here three multiplications are performed of arguments ‫ݑ‬ и ‫ݒ‬ by machine word size numbers ‫ݔ‬ and ‫ݕ‬ . In other words, the number of operations per iteration is approximately three times larger opposite the number of operations per iteration of the Euclidean scheme. In addition, in the approximating algorithm, it is necessary to find the element inverse modulo A for the extraneous factor if it appears.
If we take into account that the number of iterations of the approximating algorithm is approximately 5 times less than the number of iterations of the Euclidean scheme, then, in general, the return run of the approximating algorithm is comparable in speed with the return run of the Euclidean Algorithm.
If we compare amount of work at the first (direct) stage of both algorithms then in both case the first stage continues longer than the second one. For example, according to the Euclidean scheme the main operation of one iteration is integer division ‫ݍ‬ = ‫ܣ‬ ‫ܤ/‬ has complexity at best ‫ܮ(ܱ‬ ⋅ ln ‫)ܮ‬ relative to the length of the input numbers. Therefore the calculation of inverse elements in finite fields according to the approximating algorithm scheme will be performed 3 to 5 times faster than according to the classical Euclidean scheme.

Conclusions
In the article we have derived the basic formulas for efficient programming of the Extended Approximating Algorithm. These formulas allows users to perform operations in finite fields several times faster than by the Extended Euclidean Algorithms.