The distribution of solutions to xy = N mod a with an application to factoring integers

We consider the uniform distribution of solutions $(x,y)$ to $xy=N \mod a$, and obtain a bound on the second moment of the number of solutions in squares of length approximately $a^{1/2}$. We use this to study a new factoring algorithm that factors $N=UV$ provably in $O(N^{1/3+\epsilon})$ time, and discuss the potential for improving the runtime to sub-exponential.


Introduction
Let gcd(a, N ) = 1. A classic application of Kloosterman sums shows that the points (x, y) mod a satisfying xy = N mod a become uniformly distributed in the square of side length a as a → ∞. In this paper we investigate an application of this fact to the problem of factoring integers. We give a new method to factor the integer N which beats trial division, and prove that it runs in time O(N 1/3+ǫ ).
While the complexity of our method is not exciting, considering the existence of several probabilistic sub-exponential factoring algorithms, the runtime here is provable and does compete favourably with the best known provable factoring algorithm, Pollard-Strassen, which only runs in time O(N 1/4+ǫ ). Shank's class group method runs in time O(N 1/5+ǫ ) assuming the GRH. Our algorithm is described in Section 2.
Furthermore, proving this runtime requires understanding the finer distribution of solutions to xy = N mod a, and our results in this regards are interesting in their own right. We discuss the problem on uniform distribution in Sections 4 and 5.
Finally, all existing sub-exponential factoring algorithms have grown out of much weaker exponential algorithms, and we hope that the factoring ideas presented here will be improved. In Section 3 we discuss some needed improvements to achieve a better runtime.
We have not implemented the algorithms described in this paper. The purpose of this paper is to present a new approach to factoring integers and analyse its runtime.

Algorithm-hide and seek
Let N be a positive integer that we wish to factor. Say N = U V where U and V are positive integers, not necessarily prime, with 1 < U ≤ V . For simplicity, assume V < 2U , so that V < (2N ) 1/2 . The general case, without this restriction, will be handled at the end of this section.
The idea behind the algorithm is to perform trial division of N by a couple of integers, and to use information about the remainder to determine the factors U and V .
Let a be a positive integer, 1 < a < N . By the division algorithm, write Assume that u 0 is relatively prime to a, and likewise for v 0 , since otherwise we easily extract a factor of N by taking gcd(a, N ). If, for a given a, we can determine u 0 , u 1 , v 0 , v 1 then we have found U and V .
Consider N = u 0 v 0 mod a. One cannot simply determine u 0 and v 0 from the value of N mod a, because φ(a) pairs of integers (x, y) mod a satisfy xy = N mod a (if x = mu 0 mod a, then y = m −1 v 0 mod a, where gcd(m, a) = 1).
We then list all φ(a) pairs of integers (x, y), with 1 ≤ x, y ≤ a, that satisfy xy = N mod a, throwing them into our squares of side lengths a 1/2 . We can assume that gcd(a, N ) = 1, because, otherwise we easily extract a factor of N .
We can compute all inverses mod a, and hence all (x, y) = (x, x −1 N )moda in O(a) operations mod a. To compute all inverses, start with m = 2, multiply mod a by m until we arrive at 1, or hit a residue class already encountered (in which case m is not invertible). Then, take the first residue not yet encountered and repeat the previous step until all residue classes are exhausted.
Having produced all solutions for the modulus a, we then repeat the process for the modulus a − 1. For each solution (x 1 , y 1 ) to xy = N mod a − 1, we determine which a 1/2 ×a 1/2 square it falls within, and consider all nearby (with each coordinate within a 1/2 , wrapping to the opposite side of the larger a × a square if needed) solutions (x 0 , y 0 ) to xy = N mod a from our list of stored solutions. We set µ 0 = x 0 , ν 0 = y 0 , µ 1 = x 1 −µ 0 , ν 1 = y 1 −ν 0 , and check whether (µ 1 a+µ 0 )(ν 1 a+ν 0 ) = N . If so, we have determined a non-trivial factor of N and quit.
How much work does comparing pairs of points (x 0 , y 0 ) and (x 1 , y 1 ) entail? There are φ(a − 1) solutions to xy = N mod a − 1, and, typically, we expect there to be only a handful of solutions to xy = N mod a whose coordinates are each within a 1/2 . Each such pair of solutions gives us candidate values µ 0 , ν 0 and µ 1 , ν 1 for u 0 , v 0 and u 1 , v 1 , and we check to see whether they produce N = (u 1 a + u 0 )(v 1 a + v 0 ). On average, each a 1/2 × a 1/2 square contains O(1) points, the overall time to check all squares and points is roughly predicted to be O(a). In Section 4 we obtain a runtime bound of O(a 1+ǫ ). This algorithm terminates successfully when the true points (u 0 , v 0 ) and (u 0 + u 1 , v 0 + v 1 ) are found. Since a = O(N 1/3 ) this gives a running time that is provably O(N 1/3+ǫ ).
The idea that lies behind the algorithm suggests the name 'Hide and Seek'. The solutions that we seek (u 0 , v 0 ) and (u 0 + u 1 , v 0 + v 1 ) are hiding amongst many solutions in the large a × a square, but, like children who have hidden next to one another while playing the game Hide and Seek, they have become easier to spot.
Step 2 Use the Euclidean algorithm to compute gcd(N, a − δ) for δ = 0, 1. If either gcd is > 1 then we have determined a non-trivial factor of N and quit.
Step 3 Compute and store in an array all φ(a) points of H N,a . This can be done using O(a) arithmetic operations mod a as described above.
Step 4 For 0 ≤ m, n < a 1/2 , initialize a doubly indexed array, 'Bin'. Each element, Bin[m,n], will contain a list of points and be used to partition H N,a . Each is initially set to empty.
Step 5 Partition the elements of H N,a according to squares of side length a 1/2 by computing, for each (x, y) ∈ H N,a , the values m = ⌊x/a 1/2 ⌋ and n = ⌊y/a 1/2 ⌋, and appending the point (x, y) to Bin[m,n].
Step 6b Loop through the nearby points (x 0 , y 0 ) of H N,a whose coordinates lie, left and downwards, within a 1/2 . Typically, this entails examining the four bins Bin[m−ǫ 1 , n−ǫ 2 ], where ǫ 1 , ǫ 2 ∈ {0, 1}. However, slight care is needed when crossing over an edge of the a × a square-one should wrap to the opposite side of the square.
Step 6c If so, we have determined a non-trivial factor of N and quit.
The storage requirement of O(N 1/3 ) can be improved to O(N 1/6 ) by generating the solutions (x, y) to xy = N mod a−δ lying in one vertical strip of width O(a 1/2 ) at a time (easy to do since we can choose x as we please, which then determines y). In general, we are then no longer free to generate all modular inverses at once, and must compute inverses in intervals of size a 1/2 , one at a time, at a cost, using the Euclidean algorithm, of O(a ǫ ) per inverse.
Instead of working with small squares of side length a 1/2 , partition the a × a square into rectangles of width w and height h, with wh = N 1/3 . We would like to select w roughly equal to N α−1/3 and hence h = N 1/3 /w roughly equal to N 2/3−α .
These rough values of w and h are needed to make sure that, using the same notation as before, (u 0 , v 0 ) and (u 0 + u 1 , v 0 + v 1 ) are in the same, or neighbouring, rectangles. More precisely, Thus the x-coordinates of (u 0 , v 0 ) and (u 0 + u 1 , v 0 + v 1 ) are < w apart and the y-coordinates are ≤ h apart.
Since we do not, a priori know α, we cannot simply set w and h. Instead, we use an exponentially increasing set of w's, for example starting with w = 2, and, repeatedly applying the above procedure, each time doubling the size of w, until w > N α−1/3 and one successfully factors N .
The area of each rectangle is N 1/3 , and of the a×a square is approximately N 2/3 , so there are O(N 1/3 ) rectangles (at the top and right edges these will typically be truncated), and, on average, each contains O(1) solutions to xy = N mod a − δ. Running through each rectangle and its immediate neighbours, checking all pairs of points in these rectangles suggests O(N 1/3 ) operations are needed for a particular choice of w and h. Since we might have to repeat this a few times, doubling the size of w, the overall running time gets multiplied by O(log N ) which is O(N ǫ ).
In Section 5, a running time equal to O(N 1/3+ǫ ) is proven. The steps described in this section are summarized below.
Step 1 Carry out trial division on N up to N 1/3 . If a non-trivial factor of N is found quit.
Step 3 Use the Euclidean algorithm to compute gcd(N, a − δ) for δ = 0, 1. If either gcd is > 1 then we have determined a non-trivial factor of N and quit.
Step 4 Compute and store, in two arrays, all the points of H N,a and H N,a−1 .
Step 5 Set j = 0. While we have not succeeded in finding a non-trivial factor of N : • Step 5a Increment j by 1 and set w = 2 j and h = N 1/3 /w.
Step 5b For 0 ≤ m < a/w and 0 ≤ n < a/h, initialize a doubly indexed array, 'Bin', whose elements, Bin[m,n], will contain lists of points and be used to partition H N,a . Each bin is initially set to empty.
Step 5c Partition the elements of H N,a according to rectangles of width w and height h by computing, for each (x, y) ∈ H N,a , the values m = ⌊x/w⌋ and n = ⌊y/h⌋, and appending the point (x, y) to Bin[m,n].
Step 5d For each (x 1 , y 1 ) ∈ H N,a−1 : Step 5d1 Determine which bin it corresponds to by computing m = ⌊x 1 /w⌋ and n = ⌊y 1 /h⌋.
Step 5d2 Loop through the nearby points (x 0 , y 0 ) of H N,a whose coordinates lie, left and downwards, within w and h respectively. Typically, this entails examining the four bins Bin However, slight care is needed when crossing over an edge of the a × a square-one should wrap to the opposite side of the square.
Step 5d3 If so, we have determined a non-trivial factor of N and quit.
Step 5e Free up the memory used by 'Bin'.

Towards a subexponential bound
The above algorithm exploits the fact that when a is large, and δ is small, the points with coordinates (U, V ) mod a − δ are close to one another. In fact they lie equally spaced on a line with common horizontal difference u 1 , and vertical difference v 1 .
An obvious thing to try is to reduce the size of a. However, as a decreases, u 1 and v 1 increase so that not only do the points (u 0 , v 0 ) and (u 0 + u 1 , v 0 + v 1 ) move far apart, the latter point soon falls far outside the square of side length a.
To fix this, one can view (1) as the base a expansion of U and V . When a is smaller, one could instead use a polynomial expansion with u d1 = 0 and v d1 = 0. For simplicity in what follows, assume that the degrees of both polynomials are equal, A polynomial of degree d is determined uniquely by d + 1 values. Imitating the approach in Section 1, we evaluate N mod a − δ for d + 1 values of δ. A natural choice might be δ = 0, ±1, ±2, . . ., but, to keep our polynomial values positive, we consider non-negative values of δ, and, for good measure, take extra values, δ = 0, 1, 2, . . . 2d (by extra, we mean δ ≤ 2d rather than d leqd). Now, Since 0 ≤ u j < a, we have and similarly for the v j 's. For each δ one lists all solutions (x, y) to 0 < x, y < aλ(d, δ).
The number of points (x, y) for a given δ is φ(a − δ) per a × a square, and hence, overall, equals We are again assuming that gcd(a − δ, N ) = 1, otherwise one easily pulls out a factor of N . We need a method to recognize the solutions that we seek (u d δ d + . . .+ u 0 , v d δ d + . . . + v 0 ) hiding amongst all the (x, y)'s. This leads to the question: Let X > 0 and let S 0 , S 1 , . . . , S 2d be 2d + 1 sets of points ∈ Z 2 all of whose coordinates are positive and ≤ X. Assume that amongst these points there exists 2d + 1 points, one from each S δ , whose coordinates are described by polynomials u(δ), v(δ) ∈ Z[δ] of degree d. More precisely, for each 0 ≤ δ ≤ 2d there exists a point (x δ , y δ ) ∈ S δ such that Can one find these 2d+1 points much more efficiently than by exhaustively searching through all possible 2d + 1 tuples of points? For example, can one find these points in time O(X α d βd ) for some α, β > 0?
In our application, X = O(a(2d) 2d ). Since N = U V and a d ≤ U < V < a d+1 , we have a < N 1/(2d) . Assuming that there is an O(X α d βd ) time algorithm for finding points with polynomial coordinates, on taking d proportionate to (12) log N log log N 1/2 one gets a factoring algorithm requiring (13) exp γ(log N log log N ) 1/2 time and storage, for some γ > 0. One can cut back a bit on the search space, by noting, for example, that the coefficients of u(δ) and v(δ) are integers (this imposes a divisibility restriction on finite differences between points lying on the polynomial), and, in our particular application, that the coefficients are non-negative and bounded, and this restricts the rate of growth of the polynomials. However, to get down to a running time polynomial in X, one needs to do much better.

Uniform distribution
Let gcd(a, N ) = 1. A classic application of Kloosterman sums shows that the points (x, y) mod a satisfying xy = N mod a become uniformly distributed in the square of side length a as a → ∞. While the tools used in this section are fairly standard, they will also be applied in the next section to estimate the running time of the Hide and Seek algorithm. Similar theorems can be found in the literature [ [9], often with restrictions to prime values of a or to N = 1.
Consider the following identity which detects pairs of integers (x, y) such that xy = N mod a: where e(z) = exp(2πiz), and wherex stands for any integer congruent to x −1 mod a, if the inverse exists. Recall that we have assumed gcd(a, N ) = 1 so that any solution to xy = N mod a must have gcd(x, a) = 1. Thus, for such solutions, x −1 mod a exists. Let R be the rectangle bounded horizontally by x 1 , x 2 ∈ Z and vertically by y 1 , y 2 ∈ Z, where 0 ≤ x 1 < x 2 ≤ a and 0 ≤ y 1 < y 2 ≤ a: Let c R (N, a) denote the number of pairs of integers (x, y) that lie in the rectangle R, and satisfy xy = N mod a: The identity above gives Notice that we only need to restrict x to gcd(x, a) = 1 and that y runs over all residues in y 1 ≤ y < y 2 . This will allow us to deal with the sum over y as a geometric series. The k = 0 term provides the main contribution while the other terms can be estimated using bounds for Kloosterman sums. We require two lemmas. The first considers the main contribution, and the second bounds the remaining terms. for any ǫ > 0.
Proof. The k = 0 term is 1.
Using the Mobius function we have where τ (a) equals the number of divisors of a and is O(a ǫ ) for any ǫ > 0. This implies that the k = 0 contribution to c R (N, a) equals which gives the lemma.
The next lemma bounds the contribution of the k ≥ 1 terms in (17). Proof. One can separate the sum over y and evaluate it as a geometric series obtaining for the lhs above Taking absolute values we get an upper bound of Next, notice that the terms k and a − k give the same contribution, so we may restrict our attention to just the terms 1 ≤ k ≤ (a − 1)/2. If a − 1 is odd, the middle term is left out at a cost of O(1), and the bound becomes The second sum above over x can be expressed in terms of Kloosterman sums, and using estimates for Kloosterman sums one has (26) For a proof, see Lemma 4 on page 36 of Hooley's book [5] where a proof is given (his r corresponds to our a, and his l is −kN . Also recall that we are assuming gcd(N, a) = 1 so that N does not appear in the gcd of the O term). Furthermore, using the Taylor expansion of sin(x) one obtains the two inequalities sin(x) ≤ min(x, 1), x ≥ 0, For the second inequality, use 0 < x/2 < x − x 3 /3! < sin(x) in the stated interval.
These two lemmas together give the following theorem.
This theorem shows that the points (x, y) satisfying xy = N mod a are uniformly dense in the sense that the rectangle R contains its fair share of solutions, so long as the area of R is of larger size than a 3/2+ǫ . For example, if R is a square, it needs to have side length at least a 3/4+ǫ to contain its fair share of points. This is considerably larger than the side length of a 1/2 that is used in the algorithm of Section 1.
The paper of Shparlinski [7] contains many references to the problem of uniform distribution and discusses improved results on average over N .

Second moment and running time
We now examine the assertion made in Section 1 that O(a 1+ǫ ) time is needed to scan across all a squares of side length a 1/2 and their immediate neighbours, comparing all pairs of points contained in said squares.
The running time of the algorithm in Section 2.1, i.e. in the case U ≤ V < 2U depends on how the solutions to xy = N mod a and x ′ y ′ = N mod (a − 1) are distributed amongst the small squares of side length a 1/2 . In Section 5.1 we will consider the running time of the variant in Section 2.1 which is used for the general situation 1 < U ≤ V < N .
Let S denote one such square, i.e. of side length a 1/2 . Then the running time needed to examine just the square S, looking at all pairs of points (x, y), N, a)c S (N, a − 1)), which, by the arithmetic geometric inequality is O(c S (N, a) 2 + c S (N, a − 1) 2 ). The algorithm also requires us to compare points in neighbouring squares, say S 1 and S 2 , which, similarly, takes O(c S1 (N, a) 2 + c S2 (N, a − 1) 2 ) time. Hence, the overall running time to compare pairs of points is the sum being over the roughly a squares of side length a 1/2 that partition the a × a square {(x, y) ∈ Z 2 |0 ≤ x, y < a} (at the top and right edges we get rectangles, unless a 1/2 is an integer). Consider now the contribution from the points mod a: For convenience, rather than deal with squares S of side length a 1/2 , we will estimate (36) by making a small adjustment and partitioning the a × a square into squares of side length We also assume that gcd(b, a) = 1. If not, replace b with b + 1 until this condition holds. By equation (20), this will not take long to occur, so that, for any ǫ > 0, b = a 1/2 + O(a ǫ ). Thus, consider the squares Since b ∤ a , these will not entirely cover the a × a square, but the number of points (x, y) ∈ Z 2 satisfying xy = N mod a that are neglected at the right most and top portions of the a × a square is, by (20), O(φ(a)b/a), and these therefore contribute O(φ(a) 2 b 2 /a 2 ) = O(φ(a)) to (36).
The points (x, y) ∈ Z 2 belonging to an a 1/2 ×a 1/2 square S are contained entirely in at most four squares, say B i1j1 , B i2j2 , B i1j3 , B i4j4 , of side length b. Therefore, Since each B square overlaps with O(1) S squares, we thus have that the φ(a) accounting for the contribution from the neglected portion at the right most and top portions of the a × a square. A similar consideration for the points satisfying xy = N mod (a − 1), partitioning the larger a × a square into squares D of side length d, where d is the smallest integer greater than ⌈(a − 1) 1/2 ⌉ which is coprime to a, gives the same kind of sum Therefore, we need to estimate the second moment where B ranges over all ⌊a/b⌋ 2 squares of the form (38). To prove that the running time of the hide and seek algorithm of Section 1 is O(N 1/3+ǫ ) we need to prove that (43) is O(a 1+ǫ ). Proof. Rather than look at just the a × a square, it is helpful to consider the ba × ba square {(x, y) ∈ Z 2 |0 ≤ x, y < ba}. The advantage of looking at the larger square will become apparent when we turn to the discrete Fourier transform, and will be summing over all the ath roots of unity. This larger square can be partitioned into b 2 squares of side length a. Because the solutions to xy = N mod a repeat mod a, we can count each c B (N, a) 2 once per a × a square, by summing c B ′ (N, a) On the other hand, we can also partition the ba × ba square into a 2 squares of side length b: with B ij given by (38). Each translate of a B square, B ′ = B + (r 1 a, r 2 a), is covered by at most four B ij ∈ P 2 , and each B ij ∈ P 2 overlaps at most four such translates of B.
Hence, applying the Cauchy-Schwartz inequality as before, To study c B (N, a) 2 we multiply equation (17) by its conjugate, giving (48) Next, sum over all B ij ∈ P 2 , and break up each sum over ( Now, the inner most sum, is a product of two geometric series and equals We (recall that we have chosen b so that gcd(b, a) = 1). Therefore, only the terms with k 1 = k 2 contribute to (49) and it equals The k = 0 term gives, on separating the sum over x 1 and x 2 , which, by (20) and using b ∼ a 1/2 , equals Next, we deal with the terms 1 ≤ k ≤ a − 1. The sum over i in (52) equals To analyze this sum, we use the two dimensional discrete Fourier transform and (55) equals, on changing order of summation, The bracketed sum over i is similar to the sum over j worked out above and equals  

But, (61)
However, the sum on the rhs is a Kloosterman sum and are known [8] [6] to satisfy the bound The φ(a) 2 terms comes from the k = 0 contribution, (54). We must isolate this term, otherwise the estimate below will be too large. Separating sums gives Both sums can be bounded using the same approach as for (24) in the previous section, namely: combining terms k and a − k (similarly for the m sum, but taking the m = 0 term alone), breaking up the sum into the terms with k ≤ a/(πb) ∼ a 1/2 /π (respectively, m), applying inequalities (27), estimating the resulting sums, using b ∼ a 1/2 , we find, for any ǫ > 0, that (65) equals We have thus estimated the sum that appears on the rhs of (47). The sum that we wish to bound appears on the lhs of (47) but with an extra factor of b 2 . Hence, dividing the above by b 2 gives O(a 1+ǫ ) for the sum in theorem.
Remark: In certain cases, such as when a = p 2 , with p prime, one can improve the above estimate for the second moment to O(a) by taking b = p and, for x = jp + l, with gcd(l, p) = 1, usingx =l 2 (l − jp).

5.1.
Running time of the variant, for 1 < U ≤ V < N . Instead of partitioning the a × a square into smaller squares of side length b ∼ a 1/2 , we partition it into rectangles R of width w < a and height h < a, where w, h ∈ Z and, for convenience, gcd(w, a) = gcd(h, a) = 1.
As in Section 4.1, we have with wh appearing on the lhs since the large wa × ha rectangle has that many copies of the a × a square. Using the discrete Fourier transform, as before, This useful identity expresses the second moment for the larger wa × ha rectangle as a sum involving Kloosterman sums. The k = 0 term can be estimated as in (54) and asymptotically equals (69) h 2 w 2 a 2 φ(a) 2 .
For the k ≥ 1 terms, we use bound (63) to estimate the Kloosterman sums and separate the double sum above to get a contribution of The first sum is estimated to equal O(τ (a)ah) while the second sum is O(aw), giving, for k ≥ 1 a contribution of (71) O(a 1+ǫ wh) for any ǫ > 0. Putting (69) and (71) together, then dividing the lhs of (67) by wh gives the following estimate for the second moment: Theorem 5.2. Let 1 < w, h < a, with gcd(w, a) = gcd(h, a) = 1. Then, using the notation above, we have an estimate for the second moment that depends on the area wh of the rectangles R: Remark: if gcd(w, a) = gcd(h, a) = 1 does not hold, one can bound the lhs of (72) by comparing with the same kind of sum, but where w and h are incremented, as before, by at most O(a ǫ ) until this gcd condition holds. So long as w, h ≫ a ǫ to begin with, the estimates in the above theorem are unaffected.
In Section 1.2, our choice of w and h has wh = O(a), and the estimate for the second moment is thus O(a 1+ǫ ), as in the previous section.
The second estimate of the theorem (not relevant for our particular application), O(whφ(a) 2 /a 2 ), can probably be turned into an asymptotic formula and a central limit theorem proven. This will remain an inquiry for the future. 5.1.1. Acknowledgements. I wish to thank Andrew Granville, Carl Pomerance, and Matthew Young for helpful feedback.