Can we Beat the Square Root Bound for ECDLP over 𝔽p2 via Representation?

Abstract We give a 4-list algorithm for solving the Elliptic Curve Discrete Logarithm (ECDLP) over some quadratic field 𝔽p2. Using the representation technique, we reduce ECDLP to a multivariate polynomial zero testing problem. Our solution of this problem using bivariate polynomial multi-evaluation yields a p1.314-algorithm for ECDLP. While this is inferior to Pollard’s Rho algorithm with square root (in the field size) complexity 𝓞(p), it still has the potential to open a path to an o(p)-algorithm for ECDLP, since all involved lists are of size as small as p34, $\begin{array}{} p^{\frac 3 4}, \end{array}$ only their computation is yet too costly.


Introduction
Given an elliptic curve E, defined over a finite field Fq and two points P and Q on this curve, the elliptic curve discrete logarithm problem (ECDLP) consists in recovering k modulo order of P such that Q = kP, if it exists.
Nowadays, ECDLP-based cryptosystems are omnipresent in everyday life cryptography, as they are widely spread in cryptographic suites such as TLS.
Many algorithms for ECDLP have been proposed in the literature from Shanks Baby-Step Giant-Step [19], to Weil descent and index calculus methods [2,[4][5][6][7]. Other algorithms have been designed for particular cases, such as elliptic curves over a field Fp of group order exactly p [14,20], or in a subgroup of order p [17]. We refer the reader to [3] for an overview of ECDLP algorithms.
Pollard's Rho algorithm offers the best time and space complexity for elliptic curves defined over a finite field Fp, where p is prime, with a running time of O ( √ p) and constant memory (e.g. using Floyd's cycle finding algorithm). Pollard's Rho method basically creates a sequence (R i ) i≥1 of elements of ⟨P⟩ with R 0 = O and R i+1 = g(R i ) for a well chosen function g, such that each R i satisfies R i = a i P + b i Q. Once a collision appears between two elements R i and R j , we have (a i − a j )P = When the elliptic curve is defined over F p n for n ≥ 2, and p being a prime or a power of a prime, it is possible to perform what is called a Weil Descent. The core idea of Gaudry's algebraic factor-base algorithm [6] for F p n is to split a point into an n-sum of points from a factor base that is defined over the base field Fp. Using which is for n = 2 no better than Pollard's Rho algorithm. Moreover, with a factor base of size p, there is no hope to obtain an o(p)-algorithm. Yet, variations of the factor base might be possible as demonstrated in Petit et al. [13].

Our contribution
Our (ultimate) goal is to break the square root bound for ECDLP over F p 2 by designing a 4-list algorithm using the representation technique. The representation technique already proved to be useful to break the squareroot barrier in the context of the subset-sum problem [1,8].
We define a problem, called Zero-Join (ZJ-Problem), which given two lists A and B of points of F 2 p , and a polynomial f ∈ Fp[X 1 , X 2 , X 3 , X 4 ] consists in returning a list C of all ((x 1 , y 1 ), (x 2 , y 2 )) ∈ A × B such that f(x 1 , y 1 , x 2 , y 2 ) = 0. We show that A and B are such that |A| · |B| = p 3 2 . Moreover, we show that any algorithm which solves the ZJ-Problem in time T and in memory M also solves ECDLP over F p 2 in time T and memory M. In particular, if |A| = |B| = p 3 4 , and in (the extreme) case that the ZJ-Problem could be solved in time linear in |A| and |B|, ECDLP could be solved in O (︁ p 3 4 )︁ . A trivial solution to the ZJ-Problem can be achieved in quadratic time which results in a p 3 2 -algorithm. However, we show that the ZJ-Problem admits sub-quadratic solutions. When using multi-evaluation techniques for bivariate polynomials, we achieve a p 1.314 -algorithm. We leave it as an open problem whether this can be further improved.

Organisation of the paper
In Section 3, we present our new ECDLP 4-list algorithm for an elliptic curve defined over F p 2 , for any prime p > 3. We also show that ECDLP reduces to the ZJ-Problem. Finally, we present a sub-quadratic algorithm to solve the ZJ-Problem in Section 4, resulting in our p 1.314 -algorithm.

Preliminaries
For any integer r, we denote by Zr the ring Z/rZ. For any two integers a < b ∈ Z, we denote by [a, b] the set of all integers a ≤ k ≤ b and by [a, b) the set of all integers a ≤ k < b. For any n > 0, we denote by log n the logarithm in basis 2 of n. For any real x, we denote by ⌊x⌉ the rounding to the closest integer to x, when tied, we round to the greater one. Let us denote Fp the finite field with p elements. Let x be an element of F p 2 , x = x (0) + αx (1) , such that there exists β ∈ Fp and α is a root of X 2 − β, which is irreducible over Fp. Let B be a basis of F 2 p with respect to α. The vector (x (0) , x (1) ) ∈ F 2 p in basis B is uniquely associated to x.
Let q = p n for n ≥ 1, and p > 3 prime. Let Fq be a finite field and let E be an elliptic curve over Fq. We denote by O the point at infinity of E. The group E(Fq) is the group of all the points in E. As p > 3, we can define E in Weierstraß form. In particular, there is a function f : Although there are other ways to represent an elliptic curve, in this paper, we focus on the Weierstraß model. Let S 1 and S 2 be subsets of N, we denote by S 1 + S 2 the set of integers z such that z = x + y with x ∈ S 1 and y ∈ S 2 , and for all ℓ ∈ N, we denote by ℓS 1 the set of integers z such that z = ℓx with x ∈ S 1 . Let L be a list. We consider L as a tuple (ℓ 0 , . . . , ℓ N−1 ), where N = |L| ≥ 0. We denote by L[i] the (i + 1)th element of the list. Given two lists L 1 = (ℓ 0 , . . . , ℓ N ) and L 2 = (t 0 , . . . , t M ), we denote by . An empty list is denoted by ⊥.

Complexities
We use the standard Landau notations to describe the complexities presented in this paper. The time complexities are given in number of elementary operations (negation, addition, multiplication, inverse) over Fp, and the memory complexities in number of elements of Fp that are to be stored. Using this convention, elementary operations over F p 2 and sum of two points of the group E(F p 2 ) are performed in time O (1), and storing an element of F p 2 or a point of E(F p 2 ) requires O (1) memory.

Discrete logarithm over an elliptic curve
Let E be an elliptic curve over a finite field Fq. Let us denote by |E(Fq)| the order of the group (i.e. the cardinality of E(Fq)). By Hasse's Theorem, we know that |E(Fq)| = O (q). Let P be a point of E(Fq). The order r of P is the cardinality of the cyclic group ⟨P⟩ = {O, P, 2P, . . . , (r − 1)P}. It is also the first integer ℓ > 0 such that ℓP = O.
As ⟨P⟩ is a subgroup of E(Fq), it is clear that r ≤ |E(Fq)|. of prime order r = O (q). Let G = ⟨P⟩. Given P and Q in G, solving the discrete logarithm problem over E consists in recovering k ∈ Zr such that kP = Q.
We are interested in the case where q = p 2 , for a prime p > 3. We call this particular variant p 2 -ECDLP.

Representation Technique
In [8], Howgrave-Graham and Joux introduced the representation technique to improve methods for the subset-sum problem. Given a vector a ∈ Z n and a target t in Z the problem basically consists in finding a vector e ∈ {0, 1} n such that ⟨a, e⟩ = t. When the solution e consists exactly of n/2 non-zero coefficients, Howgrave-Graham and Joux noticed that it can be split into sums of two vectors e = e 1 + e 2 , where e )︀ different ways. It is enough to find only one of these representations to recover a solution to the problem, thus additional constraints can be enforced on the dot products ⟨a, e 1 ⟩ and ⟨a, e 2 ⟩, so that only one of these representation survives. This yields a 2 0.337n algorithm for subset-sum that breaks the square root bound.
We would like to transfer this idea to the p 2 -ECDLP problem.

New ideas to solve p 2 -ECDLP
Let E(F p 2 ) be an elliptic curve defined over F p 2 , with p > 3. Let r be the order of E(F p 2 ), which can be computed in time polynomial in log p with Schoof's algorithm [15,16]. Since we are interested in cryptographic applications we assume that r is prime. By Hasse's theorem we know that r ≈ p 2 . Let P generate E(F p 2 ) and let Q = kP for some unknown k ∈ Zr that we want to compute.
First notice that for all (k 1 , k 2 ) ∈ Z 2 r , (k 1 + k 2 )P = Q is equivalent to k 1 P = Q − k 2 P. In the following we try to recover such a tuple (k 1 , k 2 ).

Heuristic
For the complexity analysis of our algorithm, we assume that the points of E(F p 2 ) are uniformly distributed over F 2 p 2 . The results of Kim, Tibouchi [9] provide some theoretical justification for our assumption.

Solving p 2 -ECDLP Using the Representation Technique
First note that every positive integer s can be uniquely decomposed as by computing first the integer division of s = s 2 p +s by p, and then the integer division ofs = s 0 + s and s 2 ≥ 0. With respect to this decomposition, we denote by S 1 and S 2 the subsets of [0, r) such that Informally, the bits of the elements of S 1 and S 2 are dispatched as shown in Figure 1a. We consider k as the sum k 1 + k 2 where k 1 ∈ S 1 and k 2 ∈ S 2 . Since k 1 , k 2 overlap in log( r p ) bits we expect that each k has at least r p ≈ p representations as a sum k 1 + k 2 . This is shown more formally in the following lemma.
√ p⌉ +k 2 p when decomposed as in Equation 1. We denote by S 11 the subset of S 1 of the elements s = k 0 + s 2 p, and by S 12 the subset of S 1 of the elements s =k 0 + s 2 p. We first show that for any k 1 ∈ S 11 such that k 1 ≤ k, there exists some k 2 ∈ S 2 such that k 1 + k 2 = k. Then we claim that for any k 1 ∈ S 12 , such that k Let now k 1 ∈ S 12 , such that k 1 > k. First note that k 1 < r <k. From the definition of S 12 , k 1 =k 0 + k 12 p, andk − k 1 ≥ 0 means that in particulark 2 − k 12 ≥ 0. Furthermore as k It follows that the number of representations (k 1 , k 2 ) ∈ S 1 × S 2 of k is at least equal to the number N 1 of elements of S 11 that are smaller than k plus the number N 2 of elements of S 12 that are bigger than k. N 1 is the number of k Out of the p − 1 representations of k as k 1 + k 2 , (k 1 , k 2 ) ∈ S 1 × S 2 it suffices to compute a single one. Therefore, computing only a 1 p -fraction of all points (P 1 , P 2 ) = (k 1 P, Q − k 2 P) should be sufficient to compute k. In order to compute a 1 p -fraction we restrict to those points (P 1 , P 2 ) whose x-coordinate lies in the base field (i.e.
More precisely, we proceed as follows. We compute a list L of all Then we search for a collision in L and L ′ . This gives us k as the sum of the corresponding k 1 and k 2 . This results in RepECDLP, described in Algorithm 2. Notice that the resulting lists L, L ′ have expected size only p 1/2 , since we impose a 1 p restriction on a search space of size p 2/3 . Now let us turn to the tricky part, the computation of L, L ′ . Let 0 ≤ λ ≤ 1 2 be a fixed parameter to be determined later, and let T 1 , T 2 , T 3 be three sets of elements of [0, r) such that The bit repartition of the elements of T 1 , T 2 and T 3 is given by Figure 1b. The reader is advised to use the illustration in Figure 2 for the following description.
It has already been argued that all s ∈ N can be uniquely split as s = s 0 + ⌊ √ p⌉ s 1 + ps 2 . Following the same argument, for any fixed λ, s 2 In particular, from the argument above, L consists of all the points k 12 and if so return the sum of the corresponding k 1 , k 2 , k 3 , k 4 . In other words, we consider the following problem.

Problem 3.2. Given two lists
We claim that any algorithm solving Problem 3.2 can be used as the main routine to solve the p 2 -ECDLP problem. Our whole RepECDLP algorithm is summarised in Algorithm 2. The BuildLists algorithm, described in Algorithm 1 builds the lists L i for i ∈ [1, 4] and we denote by ECJoin an algorithm solving Problem 3.2, with a slight modification that does not influence the run time: the input lists contain tuples (P i , k i ), where k i is an element of [0, r), the output list must also contain tuples (P 1 + P 2 , k 1 + k 2 ). The following lemma gives the complexities of BuildLists.

Algorithm 1 BuildLists(E, P)
Require: An elliptic curve E over F p 2 , a point P ∈ E(F p 2 ).
: 10: if i = 1 then ◁ Only on the first iteration 11: : if i < p λ then ◁ In all iterations except the last one 17:

Algorithm 2 RepECDLP(E, P, Q)
Require: An elliptic curve E over F p 2 , two points P, Q ∈ E(F p 2 ). Ensure: k such that Q = kP.
)︁)︁ field operations using memory O (︁ max , which proves the given complexities. )︁ . However, as we will discuss in section 4, unbalanced list sizes might lead to time improvements.

Join
Join

Computing the Join
We now need to find a way to check whether a point (x, y) = P 1 + P 2 satisfies x ∈ Fp, knowing only the coordinates (x 1 , y 1 ) of P 1 and (x 2 , y 2 ) of P 2 . We have and therefore (x 1 − x 2 ) 2 (x 1 + x 2 + x) = y 2 1 + y 2 2 − 2y 1 y 2 Using Weierstraß' equation to discard the y 1 and y 2 terms, we obtain We denote x = x (0) + αx (1) where x (0) , x (1) are in Fp and where α satisfies α 2 = β for some β quadratic nonresidue modulo p, and we denote As each Y i can be expressed as Y i = X 2i−1 + αX 2i where the X j variables are over Fp. It follows that there exists a pair of unique polynomials g 0 and g 1 of Fp[X 1 . . . X 6 ] such that g 0 (X 1 , . . . , Taking into account that x = x (0) ∈ Fp, we come up with the following polynomial system in five variables Computing the elimination ideal of ⟨g ′ 0 , g ′ 1 ⟩, in the variable x (0) , we obtain a polynomial f of constant degree such that for all P 1 = (x 1 , y 1 ), Remark 3.5. Speaking in terms of ideal and algebraic varieties, we claim that ( . This comes from the fact that by definition ⟨f⟩ ⊆ ⟨g ′ 0 , g ′ 1 ⟩, It follows that V(⟨g ′ 0 , g ′ 1 ⟩) ⊆ V(⟨f⟩). Since by construction of g ′ 0 and g ′ 1 , ( We are now ready to define the ZJ-Problem, which lies at the heart of our ECDLP algorithm. We claim that any algorithm solving the ZJ-Problem can be used as the main routine to solve Problem 3.2. Indeed, given our lists L 1 and L 2 respectively of points P 1 = (x 1 , y 1 ) and P 2 = (x 2 , y 2 ), and given the polynomial f defined as above, we compute the list A of all (1) 2 . Now, let us denote by ZeroJoin any algorithm solving the ZJ-Problem. Let C = ZeroJoin (A, B, f).
We claim that for every P 1 , P 2 satisfying P 1 + P 2 = (x, y) with x ∈ Fp, This comes from the definition of f. However, there may be false positives -meaning tuples 2 )︁ = 0 holds, but P 1 + P 2 = (x, y) with x ∉ Fp. Therefore, we need to check the corresponding sum (x, y), while computing the list L. In other words, for all first retrieve the corresponding (P 1 , P 2 ) ∈ A × B. Then we compute (x, y) = P 1 + P 2 . If x ∈ Fp, we add (x, y) to L. This is summarized in Algorithm 3. The following lemma provides a reduction from the ZJ-Problem to Problem 3.2.

Algorithm 3
ECJoin(E, f, L 1 , L 2 ) Require: An elliptic curve E over F p 2 , two lists of point L 1 , L 2 , the polynomial f associated to E. Ensure: if x ∈ Fp then 11: Proof. Let us denote by T ECJoin the time complexity of the ECJoin algorithm. This algorithm consists of three for-loops and the ZeroJoin procedure. The first for-loop is iterated |L 1 | times, and each iteration takes time O (1). The second for-loop is iterated |L 2 | times. Once again each iteration requires a constant number of field operations. Then comes the ZeroJoin procedure which has a runtime T. Finally, the last for-loop requires to perform |C| iterations, each of them consisting of constant time operations. It follows that: It is clear that T ≥ |C|, as the list C is built during the ZeroJoin procedure. We claim that T also satisfies T ≥ |L 1 | + |L 2 |, as each of the |L 1 | polynomial and each of the |L 2 | point have to be considered at least once in order to create C. It follows that T ECJoin = T.
Let us focus on the memory. We denote by M ECJoin the memory required by the ECJoin procedure. We have: Once again it is clear that M ≥ |C|. We also claim that M ≥ |L 1 | + |L 2 | as our algorithm requires to store lists A and B of respective size |L 1 | and |L 2 |. It follows that M ECJoin = M.
We now reduce the ZJ-Problem to p 2 -ECDLP. )︀ . The second step consists in creating both the join of L 1 and L 2 , and of L 3 and L 4 . According to Lemma 3.7, this takes time T. It remains to search for a collision between two lists. This can be done in time proportional to the size of the two lists provided that they are sorted according to the first component. Under our heuristic, we can assume that the points (x, y) contained in the lists are uniformly distributed over Fp × F p 2 , as such the sorting step can be done in time proportional to the size of lists. Recall that T ≥ max(|L i |, |L|, |L ′ |) for the same argument than the one given in the proof of Lemma 3.7. It follows that the time complexity of the whole algorithm is dominated by T.
We claim that the memory complexity is dominated by the one of the ECJoin routines. Indeed as explained before, the space M required by this routine is at least O )︁ , which is enough to prove our claim.
For completeness, we now prove that this procedure actually solves the p 2 -ECDLP problem. For simplicity, we consider only the first component of the lists elements. By construction We claim that L = ECJoin(L 1 , L 2 ) is the list Indeed, by construction, it is obvious that ∀(x, y) ∈ L, x ∈ Fp. Furthermore (x, y) = k 12 P for some k 12 ∈ S 1 as (x, y) = k 1 P+k 2 P = (k 1 +k 2 )P for some (k 1 , k 2 ) ∈ L 1 ×L 2 . It follows that L ⊆ {k 12 P = (x, y), k 12 ∈ S 1 , x ∈ Fp}. We show that this is indeed an equality.
Recall that f ∈ Fp[X 1 , X 2 , X 3 , X 4 ] is constructed so that, for all points P 1 = (x 1 , y 1 ), As A is the set of the f i polynomials for all (x i , y i ) ∈ L 1 , and B is the set of the (x (0) j , x (1) j ) tuples for all (x j , y j ) ∈ L 2 , it follows, from the definition of the ZeroJoin procedure, that if P i + P j = (x, y) satisfies that x ∈ Fp, then (i, j) ∈ C = ZeroJoin(A, B). As such {k 12 P = (x, y), k 12 ∈ S 1 , x ∈ Fp} ⊆ {P i + P j , (i, j) ∈ C}, only those points which indeed satisfy x ∈ Fp are kept in L. A similar argument is used to show that It remains to show that a representation (k 12 , k 34 ) of our solution k is contained in the list L × L ′ . We argue that our algorithm succeeds in solving p 2 -ECDLP if there is a representation (k 12 , k 34 ) ∈ S 1 + S 2 of k such that k 12 It is clear that if there is no such representation of k our procedure fails, as only k 12 ∈ S 1 for which k 12 P = (x, y) with x ∈ Fp are considered. Now, if such a k 12 exists, according to Equation 3, k 12 P is in L. Similarly, if k 34 satisfies that Q−k 34 P = (x ′ , y ′ ) with x ′ ∈ Fp, then k 34 is in L ′ . Furthermore, as k 12 P = Q−k 34 P, it is enough to know that k 12 satisfying k 12 P = (x, y) with x ∈ Fp exists, to ensure that the algorithm recovers the solution.
We also claim that if the elements of E(F p 2 ) are uniformly distributed (as we assume by our heuristic), then on expectation, there exists a representation (k 12 , k 34 ) which satisfies k 12 P = (x, y) with x ∈ Fp. First note that the uniform distribution implies We denote by Nrep the number of representation of k as the sum k 12 + k 34 , k 12 ∈ S 1 , k 34 ∈ S 2 . In other words Nrep is the number of elements k 12 ∈ S 1 such that there exists k 34 ∈ S 2 , with k 12 + k 34 = k. We denote by Y the number of surviving representations in L + L ′ . In other words, Y is the number of k 12 ∈ S 1 such that there exists k 34 ∈ S 2 with k 12 + k 34 = k, and such that k 12 P = (x, y) with x ∈ Fp. By Lemma 3.1, we have Therefore for sufficiently large p we expect that one representation survives.

Solving the ZJ-Problem
The , (x 2j , y 2j )) in our lists, that is f(x 1i , y 1i , x 2j , y 2j ) = 0. Naively, we can solve the ZJ-Problem in time T = O (NM). Namely, from the first list A of points (x i , y i ) we construct a listĀ of bivariate polynomials )︁ .

Fast Polynomial multiplication
then there is an i ∈ I with f i (x, y) = 0. This gives rise to the following algorithm.

Partition the set A of polynomials into buckets
2. For each I ℓ , compute the set B I ℓ of points (x j , y j ) ∈ B, such that F I ℓ (x i , y j ) = 0.
3. For each f i ∈ A I ℓ , find all (x j , y j ) ∈ B I ℓ , such that f i (x j , y j ) = 0.
Our solution uses fast bivariate polynomial multi-point evaluation, that has on optimal choice N = √ p, M = p, for which only a single F I = F has to be computed. This special case is given in Algorithm 4. The following multi-point evaluation result is due to Nüsken and Ziegler [12].  7: for all (x j , y j ) ∈ B ′ do 8: if f i (x j , y j ) = 0 then 9: We start by showing that the time complexity of MultiEval is dominated by the bivariate polynomial multi-point evaluation line 3. At this aim, we denote by T F , T E , T B ′ and T C respectively the time complexity required to compute F at line 2, to perform the multi-point evaluation at line 3, to execute the for-loop in line 4 and to execute the for-loop in line 6. The total runtime of the MultiEval procedure is given by )︁ for any fixed ϵ > 0.
There is a small twist, however as lemma 4.1 can only be applied for a list of points (x, y) with pair-wise different x-coordinates. Considering that all x are in Fp and that |B| = p, this condition implies that all elements of Fp are present once and only once as the x-coordinate of an element of B. This is very unlikely. In fact, we proceed as follows. We partition B according to the x-coordinate and apply the Nüsken-Ziegler algorithm with one element of each partition of B. We restart until all elements of each partitions have been processed.
The number of time we have to restart is bounded by the size of the largest partition of B. If the xcoordinated are uniformly distributed over F p , we can estimate this size, using maximum-load results [11]. We obtain that the number of time we have to restart is with high probability bounded by a constant times log p log log p . Replacing ω 2 by the best known bound 3.257 [10], and for ϵ < 10 −4 , its follows that T E =Õ (︀ p 1.314 )︀ . We may assume that for each evaluated value E[j] we kept track of the corresponding (x j , y j ). Then building B ′ from E requires only to scan through each entry of E once, and add (x j , y j ) in B ′ for all E[j] = 0 that are encountered. The runtime of this step is thus linear in |E| = |B|, and therefore T B ′ = O (p).
The last step consists in evaluating all f i simultaneously in all the point of B ′ . We proceed in a naive way by taking all the polynomials, one by one, and evaluating each of them simultaneously in all the point of B ′ . This results in the two entwined for-loops in line 6 and 7. The first one iterates over all the polynomials and thus is iterated √ p times. The second one iterates over all the points of B ′ and thus is iterated |B ′ | times. Each iteration of the second loop can be performed in O (1) as all the f i are of constant degree. It follows that We hope that the result of Theorem 4.3 may be further improved. A possible direction is to replace the (unnecessary) multi-evaluation of F by some presumably more efficient zero testing method. An other research direction, as pointed out by one of the reviewers, could be to have a look at a different elliptic curve model than the one of Weierstraß (e.g. Edwards Curve). However, we are not sure whether the problem would become easier in this case.