Lattice Sieving in Three Dimensions for Discrete Log in Medium Characteristic

Gary McGuire; Oisín Robinson

doi:10.1515/jmc-2020-0008

Open Access Published by De Gruyter November 25, 2020

Lattice Sieving in Three Dimensions for Discrete Log in Medium Characteristic

Gary McGuire and Oisín Robinson

From the journal Journal of Mathematical Cryptology

https://doi.org/10.1515/jmc-2020-0008

Abstract

Lattice sieving in two dimensions has proven to be an indispensable practical aid in integer factorization and discrete log computations involving the number field sieve. The main contribution of this article is to show that a different method of lattice sieving in three dimensions will provide a significant speedup in medium characteristic. Our method is to use the successive minima and shortest vectors of the lattice instead of transition vectors to iterate through lattice points. We showcase the new method by a record computation in a 133-bit subgroup of Fp6 , with p⁶ having 423 bits. Our overall timing is nearly 3 times faster than the previous record of a 132-bit subgroup in a 422-bit field. The approach generalizes to dimensions 4 or more, overcoming one key obstruction to the implementation of the tower number field sieve.

Keywords: Number field sieve; discrete log

MSC 2010: 11Y05

1 Introduction

The most widely adopted public-key cryptography algorithms in current use are critically dependent on the (assumed) intractability of either the integer factorization problem (IFP), the finite field discrete logarithm problem (DLP) or the elliptic curve discrete logarithm problem (ECDLP). The most effective known attacks against IFP and DLP use the same basic algorithm, namely the Number Field Sieve (NFS). This algorithm has subexponential complexity in the input size. On the other hand, all known methods to attack the ECDLP in the general case have exponential complexity. However there are special instances of the ECDLP which can be attacked by effectively transferring the problem to a finite field, allowing the NFS to be used. For example such instances arise in the context of pairing-based cryptography, where certain elliptic curves can be used to realize ‘Identity-Based Encryption’ (IBE). There is a trade-off between the reduced security due to the size of the finite field on which the security is dependent, and increased efficiency of the pairing arithmetic. The optimal parameters have been the subject of intense scrutiny over the last few years, which have seen a succession of improvements in the NFS for the DLP in the medium characteristic case. This is directly relevant in the case of pairings, where the finite field on which the security of the protocol depends is typically a small degree extension of a prime field.

A key part of the NFS is lattice sieving. The main contribution of this article is to demonstrate that different methods of lattice enumeration can make a significant difference to the speed of lattice sieving.

This paper is organized as follows. In section 2, we give a very brief overview of the Number Field Sieve algorithm in the medium-characteristic case. A more detailed explanation can be found in [11]. One of the main bottlenecks of this algorithm is lattice sieving, which involves enumerating points in a (low-dimensional) lattice. We propose in section 3 a straightforward idea to significantly increase enumeration speed in dimension three and above. The idea is to change the angle of planes that are sieved through in order to reduce the number of planes. This idea has been used before for lattice enumeration in a sphere [17], however it has not been applied successfully to lattice sieving for the NFS. We show that the idea can work well by using integer linear programming to find an initial point for iteration in a plane within the sieve cuboid. In section 4 we propose a novel method to amortize memory communication overhead which applies regardless of dimension. In section 5 we give details of a new record discrete log computation in Fp6. The previous record due to Grémy et al [11] had p⁶ with 422 bits, and this paper has p⁶ with 423 bits. We deliberately chose a field size just one bit larger in order to attempt a direct comparison of methods and timings. In section we present a record pairing break with the same prime p. Finally we conclude in section 7 and mention some possible future research ideas.

2 Number Field Sieve

We start with a very brief sketch of the NFS in the most naive form suitable for computing discrete logs in Fpn. Consider the following commutative diagram:

The polynomials f₀(x) and f₁(x) are irreducible in ℤ[x] of degree n, and they define the number fields ℚ(α) and ℚ(β) respectively. We require that f₀(x) and f₁(x), when reduced modulo p, share a factor ψ(x) of degree n which is irreducible over 𝔽_p. This defines the finite field Fpn as (ℤ/pℤ)[x]/〈ψ(x)〉. Usually ψ(x) is simply the reduction of f₀(x) modulo p.

$Figure 1 Commutative diagram of NFS for discrete log in Fpn. ${{\mathbb{F}}_{{{p}^{n}}}}.$$

Figure 1

Commutative diagram of NFS for discrete log in Fpn.

In two-dimensional sieving, for a bound B, we inspect many pairs of integers (a, b) with 0 < a ≤ B and −B ≤ b ≤ B in the hope of finding many pairs such that

Res(f0,a−bx) and Res(f1,a−bx)

are both divisible only by primes up to a bound B₂. This B₂ is called the smoothness bound, or the large prime bound. The set of all polynomials a − bx that we inspect is called the search space.

There is also a bound B₁, called the factor base bound, which is the largest prime used in sieving. The factor base consists of all primes up to this bound B₁. In brief, we seek two different representations of (the image of) a − bx over the factor base. There are excellent detailed explanations of the NFS, see [3, 10, 11, 16]. We refer the reader to those papers for further details. Recently, new variations of NFS have been described where the norms (i.e. resultants) are even smaller in certain fields, see [18, 21].

2.1 Lattice Sieving

The original lattice sieve (due to J.M. Pollard [20]) has developed into the ‘special-q’ lattice sieve which we now outline. Let q be a rational prime, let r be an integer with f₀(r) ≡ 0 mod q, and let q = ⟨q, θ − r⟩ be an ideal of K=ℚ(θ)≅ℚ[x]/〈 f0 〉 lying over q. We choose q to be smaller than our smoothness bound. We fix a factor base, which will consist of all prime ideals in the ring of integers of K whose norm is smaller than our pre-determined factor base bound. We look for (integral) ideals of K that are divisible by q, and we do this by looking for ideals whose norm is divisible by q. We also would like the norm to be divisible by many other small primes p. We fix q and iterate over all p in the factor base using a sieve.

In 3-dimensional sieving, the pairs (a, b) corresponding to a − bx (in section 2) become triples (a, b, c) corresponding to a + bx + cx². A sieving ideal will have the form ⟨a + bx + cx²⟩. Relation collection examines a subset 𝒮 of the whole set of polynomials A(x) of degree 2. The subset 𝒮 is called the search space and is made of the polynomials A(x) of bounded coefficients.

We perform sieving in three dimensions as follows. We use a fixed-size sieve region H = [0, B[×[−B, B[×[−B, B[ where each lattice point will correspond to a norm which is always divisible by q and hopefully divisible by many p. Define lattices Λ_q and Λ_pq by

Lq=[ q−r001−r001 ], Lpq=[ pq−t001−t001 ]

where f₀(r) ≡ 0 mod q and f₀(t) ≡ 0 mod pq, and the columns are a basis. Compute an LLL-reduced basis for both Λ_q and Λ_pq to get matrices Lq′ and Lpq′. Then let

L′=(Lq′)−1⋅Lpq′

which is an integer matrix by construction. Let Λ′ be the lattice with basis L′. We mark all (i, j, k) in H ∩ Λ′. As a result, for a sieve location (i, j, k) that has been marked, if we let

[ abc ]=Lq′⋅[ ijk ]

then we know that the norm of ⟨a + bθ + cθ²⟩ is divisible by both q and p.

We compute and reduce L_q once per special-q, and compute L_pq etc for each p. We compute (a, b, c) only for (i, j, k) that have been marked for many p (above a pre-determined threshold).

Our new results have two aspects. First, in section 3 we improve the speed of enumeration of points in dimensions higher than two. Second, in section 4 we give a new way of avoiding cache locality issues by the use of a histogram of lattice point hits. This applies regardless of dimension.

3 Faster enumeration

In a lattice Λ of rank n recall that the i-th successive minimum is defined by

λi(Λ)=inf{ r∈ℝ:dim(span(Λ∩Br))≥i }

where B_r = {x ∈ ℝⁿ : ||x|| ≤ r}. In particular, λ₁(Λ) is the length of a shortest nonzero vector in Λ. A basis v₁, . . . , v_n for Λ is said to be a Minkowski-reduced basis if, for k = 1, 2, . . . , n, v_k is the shortest lattice element that can be extended to a basis with v₁, . . . , v_k₋₁.

We assume we are sieving in three dimensions. We fix a bound B and let

H=[0,B[×[−B,B[×[−B,B[

be the sieving region. Let Λ′ be the lattice defined in section 2. The problem is to list the elements of Λ′ ∩ H in an efficient way. In previous work this is done by going through the planes parallel to the xy-plane, and enumerating the lattice points in each of these planes. We propose a different method which uses fewer planes.

Let v₁, v₂, v₃ be vectors having lengths λ1(Λ′),λ2(Λ′),λ3(Λ′), the first three successive minima of Λ′. These three vectors are guaranteed to exist and we can either find all three, or an acceptably close approximation (see Remark 1). The origin together with v₁ and v₂ define a plane which we call P. Let

cmax=max{ c∈ℕ:H∩(P−c⋅v3)≠∅ }.

We refer to the plane G = P − c_max · v₃ as the ‘ground plane’.

Our approach is very simple: to enumerate all lattice points in H, we enumerate all points in the ground plane G that lie in H, and then all points in the translates G + kv₃ for k = 1, 2, 3, ... that lie in H, until we reach the last translate intersecting H.

Remark 1

Finding v₁, v₂, v₃ is done with the LLL algorithm. In practice, in very small dimension such as three, this is sufficient to find a Minkowski-reduced basis, or a close approximation which is good enough for our purposes.

Remark 2

To easily enumerate points in a plane G + kv₃, we first locate one point p₀ that is contained within the plane and the sieving region H. For this, we use integer linear programming (described in this context below). Once we have located p₀, we proceed to enumerate points in this plane by adding and subtracting multiples of v₁ and v₂ from p₀, until by doing so we are no longer within H. This is done inductively, by first enumerating all p₀ + c₁v₁ where c₁ runs over all integers such that p₀ + c₁v₁ is in H. Then we add v₂ and enumerate all p₀ + v₂ + c₁v₁ where c₁ runs over all integers such that p₀ + v₂ + c₁v₁ is in H. Then we add v₂ again, and repeat. This may not be the optimal method of enumerating points in G + kv₃, however it worked well in our computations and is sufficient for our purposes. Pseudo-code for this can be seen in Algorithm 1.

This inductive procedure will extend to higher dimensions, as long as the integer linear programming problem required to find the corresponding feasible points is tractable. We expect this to be the case certainly up to dimension six, and perhaps further.

Remark 3

If the lattice is very skewed, it is possible that the last valid sieving point in the plane is p _k = p_k₋₁ + v₁ + c · v₂, where c ≥ 2 (and p_k₋₁ is the previous point). It would be preferable to be able to reach all points by unit additions of v₂ so for practical purposes, we do this and ignore the cases where such ‘outlier’ points are missed.

We have quantified the percentage of missed points, based on test data from the record computation and comparison with an exhaustive search. We consistently found about 1.8% of points were missed per special-q. Since there are usually up to hundreds of thousands of primes on a sieving side, and multiple planes per prime, it would only take a small number of missed points per prime to account for this percentage. This likely happens at planes intersecting only corners of the sieve region.

Remark 4

In two dimensions, the sieving method of Franke and Kleinjung [7] is very efficient. Our approach works in the 2d case also, using the first two successive minima of the 2d lattice, but it will not quite compete with the method in [7] in terms of speed of enumeration because we must do a little extra work when dealing with boundaries. This shows that dealing with the boundaries of the sieving region is not trivial.

We summarize the above description in Algorithm 1.

Remark 5

The observer may notice that in Algorithm 1 as presented, there is no guarantee upon adding (v₁, v₂, v₃) to (x, y, z) to move to the next row, that (x, y, z) is still within the sieving plane and boundary. We have written the algorithm like this to get across the main idea, but in the code every time we switch to a new row, we solve a mini integer linear programming task to get a multiple of (u₁, u₂, u₃) to subtract from and pin (x, y, z) to the start of the row while keeping it within the boundary. Solving a one-dimensional integer linear programming problem can be done in-line, and does not lead to much extra code complexity.

Algorithm 1

3D Lattice Sieve

Input:

Boundary [0, B[×[−B, B[×[−B, B[, special-q prime q, polynomial f, memory array M.

Output:

List of prime divisibility events contained in M

procedure

r ← root of f mod q

L←[ n−r001−r001 ]

Reduce L in-place via L² algorithm (i.e. efficient LLL)

m ← 0

for every factor base prime p do

for every root s of f mod p do

n ← p · q

t ← q · (s · (q⁻¹mod p) mod p) + p · (r · (p⁻¹mod q) mod q) // Chinese remainder theorem

If t >= ⎡n, t ← t − n

L2←| nt001t001 |

Reduce L₂ in-place via L² algorithm

L₃ ← q · L⁻¹ · L₂

L₃ ← L₃/q (all entries)

Reduce L₃ in-place via L² algorithm

u = (u₁, u₂, u₃) ←first column of L₃

v = (v₁, v₂, v₃) ←second column of L₃

w = (w₁, w₂, w₃) ←third column of L₃

(n_x , n_y , n_z) ← (u₂v₃ − u₃v₂, u₃v₁ − u₁v₃, u₁v₂ − u₂v₁) // Compute normal (x, y, z) ← (w₁, w₂, w₃)

Repeatedly subtract (w₁, w₂, w₃) from (x, y, z) until we are at the ground plane

while Plane defined by (n_x , n_y , n_z) and (x, y, z) intersects box [0, B[×[−B, B[×[−B, B[ do (ws₁, ws₂, ws₃) ← (x, y, z) // Remember starting w vector

(a, b) ← (0, 0)

Compute (a, b) using algorithm 2 to place (x, y, z) within boundary

(x, y, z) ← (x, y, z) + a · (u₁, u₂, u₃) + b · (v₁, v₂, v₃)

(s₁, s₂, s₃) ← (x, y, z) // Remember starting vector

while (x, y, z) ∈ [0, B[×[−B, B[×[−B, B[ do

Subtract (u₁, u₂, u₃) from (x, y, z) until we are at the ‘start of the row’ for every vector in current row do

id ← x + ((y + B)«log₂(B)) + ((z + B)«log₂(B²))

M[m++] ← {id, log₂(p)}

(x, y, z) ← (x, y, z) + (u₁, u₂, u₃)

end for

(x, y, z) ← (x, y, z) + (v₁, v₂, v₃)

end while

(x, y, z) ← (s₁, s₂, s₃)− (v₁, v₂, v₃)

Repeat the above while loop in the opposite direction in the plane

That is, by subtracting (v₁, v₂, v₃) from (x, y, z) to move to the next row.

We can still progress within each row by pinning to start, then adding (u₁, u₂, u₃).

When finished in both directions, move to next plane:

(x, y, z) ← (ws₁, ws₂, ws₃) + (w₁, w₂, w₃)

end while

end for

end procedure

3.1 Previous Lattice Enumeration Methods

Lattice enumeration is widely used in algorithms to solve certain lattice problems, such as the Closest Vector Problem. However, sieving in a cuboid introduces many complications that do not occur when sieving in a ball.

Our enumeration here is similar to Fincke-Pohst-Kannan’s algorithm (FPK) for lattice enumeration [6, 17]. However, it is significantly different in that we sieve in a box, as opposed to a sphere. Further, we do not compute a norm for every point to test if it is within the boundary - use of a box allows us to separate many points which may be treated in fast loops with no individual boundary checking. In practice this makes a huge difference. Note that L. Grémy’s space sieve is 120 times faster than the FPK algorithm (see [10]). We outperform the space sieve by over 2.5×. Note also that sieving in a rectangular region is fundamental to Franke and Kleinjung’s 2d lattice sieve algorithm, and its success depends on the shape of this region.

3.2 Example

The lattice used in Figure 2 is the following:

Figure 2

Six dense sublattices cover every point in the sieve region.

[ 101835−121813−7−2218 ]

The sieve region is [−100, 100] × [−100, 100] × [0, 100]. In this example, using our method, planes cover every valid sieving point. With traditional plane sieving, using planes that are parallel to the base of the sieving cuboid, 101 planes are needed, each with at most four lattice points.

3.3 Integer Linear Programming

Given a plane defined by u, v and a point R, with R not necessarily contained in the sieving region defined by H = [0, B[×[−B, B[×[−B, B[, the task is to find a point p₀ = (x₀, y₀, z₀) that is provably contained in the intersection of the plane and H, if such a point exists. We look for a, b ∈ ℤ such that p₀ = R + a · u + b · v and p₀ ∈ H.

This can be formulated as an integer linear programming problem, where the aim is to either minimize x, subject to

A⋅x≤C

where x = (a, b) ∈ ℤ² and A ∈ ℤ^6×2, C ∈ ℤ⁶, depend on u, v, B, or find any feasible point, if one exists. This problem is well studied, and though it is NP-hard in general, can be solved easily in small dimensions. It is computationally trivial in dimension 3, for example, which we use in this article.

We shall give more details now. Our approach is based on ‘Fourier-Motzkin’ elimination. Given R = (x, y, z), u = (u₁, u₂, u₃), v = (v₁, v₂, v₃) we seek a, b such that R + a · u + b · v is feasible. We set up the following system of inequalities based on the boundary [0, B[×[−B, B[×[−B, B[:

[ u1v1u2v2u3v3−u1−v1−u2−v2−u3−v3 ][ ab ]≤[ B−x−1B−y−1B−z−1xB+yB+z ].

Our approach is to eliminate a, deduce b and then compute a. We first transform the system so that the entries of the first column of A are normalized (i.e.we multiply across by the least common multiple of u₁, u₂, u₃ and divide across respectively by |u₁|, |u₂|, |u₃|. Thus each entry in column 1 of A is ±k for some k ∈ ℤ). We look for all pairs of rows of A such that a cancels. We achieve this with a nested loop in (i, j) and where we disregard a pair of rows if both a and b cancel (this happens e.g. if |i − j | = 3). For a given pair of rows where we have eliminated the term in a, it is then easy to compute

b=⌊ Ci+CjVi+Vj ⌋

where C_i are the constants in the RHS vector, and V_i are the coefficients of the vector v in A, as long as V_i+ V_j = 0 (otherwise we skip this (i, j)). When we consider all pairs, we keep the smallest value of b found. We then compute the smallest corresponding value of a by back-substitution in all rows. This is summarized in algorithm 2.

Remark 6

It is possible that no feasible point exists or is found, but we can detect this by checking if p₀ is within bounds.

Algorithm 2

Integer Linear Programming

Input:

Boundary [0, B[×[−B, B[×[−B, B[, plane P defined by u = (u₁, u₂, u₃), v = (v₁, v₂, v₃), R = (x, y, z) such that R ∈ P, but R not necessarily in boundary.

Output:

(a, b) such that p₀ = R + a · u + b · v contained in both P and boundary.

procedure

U ← {u₁, u₂, u₃, −u₁, −u₂, −u₃}^T

V ← {v₁, v₂, v₃, −v₁, −v₂, −v₃}^T

C ← {B − x − 1, B − y − 1, B − z − 1, x, B + y, B + z}^T

L ← |LCM(u₁, u₂, u₃)|

Normalize system - multiply all entries by L, divide U_i , V_i , C_i by original |U_i|

a ← B, b ← B

for all pairs of rows (i, j) do

if U_i = 0 and V_i > 0 then

b_TRIAL = ⌊C_i /V_i⌋

if b_TRIAL < b then b ← b_TRIAL

else

if U_i < 0 and U_j > 0 and |i − j| ≠3 then

D ← V_i + V_j

if D > 0 then

b_TRIAL = ⌊(C_i + C_j) /D⌋

if b_TRIAL < b then b ← b_TRIAL

end if

end for

for the best value of b, compute a by back substitution.

Return (a, b)

end procedure

4 Dealing with Cache Locality Issues

Representing a 3d lattice in memory can be done in different ways. One problem with arranging points first by adjacent lines, then adjacent planes etc is that the storage/retrieval of points tends to result in random memory access patterns, which severely impacts performance. This is a fundamental concern in large-scale computation. Computer manufacturers address this by providing various levels of ‘cache’, i.e. a limited quantity of high-speed memory, too costly for main memory, which is used as a temporary store of frequently-accessed or burst-access data. Traditionally, lattice-sieving has always had to make use of cache to minimize the cost of the random memory access patterns that occur in practice. The usual way this is handled is via ‘bucket sorting’ [1].

Our idea to improve cache locality without using bucket sorting is simple: list and sort. We encode the x, y, z coordinates of each lattice point as a 32-bit integer. Asieve ‘hit’ is an ordered pair (integer, log p) where p is a rational prime that divides the norm of the ideal corresponding to the lattice point/integer. Here 2 ≤ p ≤ B₁ where B₁ is the factor base bound. We store all sieve hits in a list of increasing size. Because the list increases strictly linearly, this is ideally suited to fast memory access and is compatible with all levels of cache. By itself, this is not an advantage because we have merely collected a long list of randomly ordered pairs. However, we can sort this list by sorting the integers in increasing order, i.e., we sort these pairs by the integers only. We can then recover lattice points with a large smooth part via a linear scan of the sorted list, because if an integer appears K times as the first coordinate of a pair then it will appear with K different values of p. The more times it appears, the smoother the lattice point is.

Remark 7

Our list/sort approach must be considered carefully in terms of memory because we must store log p and full co-ordinates for every sieve event. While in practice sorting the list of all sieve events is still fast, the memory required to store the full list is considerable (it certainly does not fit in cache all at once, but cache is still essential and is used implicitly in the sorting routine, which is cache-friendly). This meant we had to ensure our sieving parameters could be set to fully utilize the compute capacity of our sieving nodes. See section 5 for further details.

Remark 8

For our record computation (see next section) we used a sieving region H of size 2⁹ × 2¹⁰ × 2¹⁰. A traditional lattice sieve would thus take 2²⁹ bytes, or exactly 512MB to store the sieve (assuming one byte to store log p per sieve location). We need to store 5 bytes per sieve hit, one byte for log p plus 4 bytes for the (x, y, z) coordinates, but note that we do not have to represent every possible sieve location, only the hits. In practice, for the parameters we used to set the record, each instance of the sieve program records roughly 500 million sieve events, and so requires about 5×500 million bytes = 2500MB of RAM in total per special-q. This is approximately a fivefold increase over 512MB for a traditional lattice sieve, but it was not a problem for us as our production nodes had the memory to cope.

Remark 9

We avoid sieving the smallest primes, as the corresponding lattices are very dense and greatly increase the memory required for the list + sort approach. This would not really be an issue if we used bucket sorting, but either way these smallest primes can probably be left out, because they tend to appear in almost every relation. The less dense lattices (not for small primes) should be sieved, and our lattice enumeration method will be very fast here.

Remark 10

Grémy et al’s [11] sieve does not include bucket sort, and neither do we. It could be a possible improvement to replace our list/sort with enumerate/bucket sieve (and we could retain our fast enumeration idea as bucket sorting relates only to memory access).

We compare sieving statistics in Table 1 between our implementation and that of [11]. Note that all of these times give total special-q time excluding the cofactorization time. In [11], sieving and memory access are intertwined and so we compare this to our combined sieving/listing/sorting time. Our siever is over 4.5× faster. Sieving speed in our case is not the bottleneck. The extra sieving speed is gained at the expense of increased memory requirements.

Table 1

Sieve performance comparison (times in seconds)

Authors	factor base bound	H	q_min	q_max	#{q}	av. time	min time	max time
GGMT	2²¹	2¹⁰ × 2¹⁰ × 2⁸	16000000	16001000	7	143.93	142.17	145.28
GGMT	2²¹	2¹⁰ × 2¹⁰ × 2⁸	86500000	86501000	14	142.07	140.53	143.82
GGMT	2²¹	2¹⁰ × 2¹⁰ × 2⁸	262000000	262001000	9	142.40	140.95	144.34
GGMT	2²²	2¹⁰ × 2¹⁰ × 2⁸	16000000	16001000	7	169.74	166.12	171.82
GGMT	2²²	2¹⁰ × 2¹⁰ × 2⁸	86500000	86501000	14	167.53	166.01	173.55
GGMT	2²²	2¹⁰ × 2¹⁰ × 2⁸	262000000	262001000	9	167.50	165.02	172.17
this work	2²⁴	2⁹ × 2¹⁰ × 2¹⁰	16000000	16001000	7	35.47	34.94	36.95
this work	2²⁴	2⁹ × 2¹⁰ × 2¹⁰	86500000	86501000	14	35.80	35.39	37.16
this work	2²⁴	2⁹ × 2¹⁰ × 2¹⁰	262000000	262001000	9	36.37	35.71	37.54

5 Record computation in Fp6

We implemented the 3d case of our lattice sieving idea in C and used it to set a new record in solving the discrete log in the multiplicative subgroup (Fp6)×. Previous records were set by Zajac [24], Hayasaka et al (HAKT) [15], and Grémy et al (GGMT) [11]. All computations were done on the main compute nodes of the Kay cluster at ICHEC, the Irish Center for High-End Computing. Each node consists of dual 20-core Intel Xeon Gold 6148 (Skylake) processors @ 2.4 GHz, with 192Gb RAM per node. All timings have been normalized to a nominal 2.0GHz clock speed.

With ϕ=(1+5)/2, we chose the prime p = ⌊10²¹·ϕ⌋+29. Our target field is Fp6, where p⁶ has 423 bits. This is comparable to the field size of the previous record at 422 bits [11]. One consequence of our choice is to allow a fair comparison of the total effort required to solve discrete logs in a field of this order of magnitude.

5.1 Polynomial selection

We implemented the Joux-Lercier-Smart-Vercauteren (JLSV₁) algorithm and ranking polynomials by their 3d Murphy E-score, after about 100 core hours found the following polynomial pair from the cyclic family of degree six described in [9]:

f0=x6−40226000394x5−100565001000x4−20x3+100565000985x2+40226000400x+1

f1=80447172120x6+104483881186x5−945497878835x4−1608943442400x3 −261209702965x2+378199151534x+80447172120

We computed the 3d alpha score for these and found α(f₀) = −3. and α(f₁) = −12.6.

5.2 Relation collection

Our implementation was written as a standalone executable, independent of CADO-NFS, producing relations in the format that CADO-NFS can use. We carry out cofactorization using Pollard’s p − 1 algorithm and two rounds of Edwards elliptic curve factorization. The cofactorisation implementation uses a standard approach and is not an improvement on CADO-NFS’s cofactorization rig. Cofactorization typically takes between 120-130 seconds, this is the bottleneck.

Our program is extremely fast to sieve. This allowed us not only to use a larger factor base, but also to search for relations that are ‘twice as difficult’ to find, i.e. to use a smoothness bound of 2²⁸ as opposed to the 2²⁹ used in the 422-bit record. We were able to use a factor base bound of 2²⁴ with no major loss of speed. Typically sieving takes less than 40 seconds per special-q, including sorting.

The time per special-q was roughly constant across the entire range, at between 150-170 seconds. The total time per special-q is made up of sieving plus cofactorization. There were about 500 million sieve hits per special-q. As we explained in Remark 8 each sieve hit takes exactly 5 bytes of memory, so we need about 5×500 million bytes = 2500MB of RAM per special-q. We were able to utilize all 40 cores on our sieving nodes, where each node has 192Gb of memory. We needed about 2500MB ×40 = 100GB in total.

We were able to use a larger sieve region due to the speed of lattice enumeration. We used a sieving region of size 2⁹ × 2¹⁰ × 2¹⁰, compared to the region 2¹⁰ × 2¹⁰ × 2⁸ used in the previous record.

We sieved most special-qs on the f₀ side with norm between 16M and 263M. Our program sieves only one ideal in each Galois orbit. We apply the Galois automorphism as a post-processing step. We found 7,152,855 unique relations and then applied the Galois automorphism (which is trivial in core-hours) and after removing duplicates we were left with 34,115,391 unique relations. The total sieving effort was 69,120 core hours.

Table 2

Key statistics of record computations in Fp6. All timings in core hours. Recall that ^S is the search space, q is the norm of q, H₀, H₁, H₂ are the bit lengths of sieving dimensions, and q_max is the largest special-q norm we encounter (which will be just below the smoothness bound).

Authors	GGMT	This work
Field size (bits)	422	423
α-values	−2.4,−14.3	−3.6,−12.6
Murphy-E	2^−20.51259	2^−20.45961
Sieving region H	2¹⁰ × 2¹⁰ × 2⁸	2⁹ × 2¹⁰ × 2¹⁰
Factor base bounds	221, 2²¹	224, 2²⁴
Smoothness bounds	229, 2²⁹	228, 2²⁸
#S=qmax2H0+H1+H2	2⁵⁵	2⁵⁷
Special-q side	0	0
Range of q	]221, 227.9[	]223.9, 230[
Galois action	6	6
#unique relations	71,850,465	34,115,391
#required relations	≈ 56M	29,246,136
purged	18,335,401	7,598,223
filtered	5,218,599	2,754,009
Total sieving time	201,600	69,120

5.3 Construction of matrix

We modified CADO-NFS [23] to produce a matrix arising from degree-2 sieving ideals for the linear algebra step.

5.4 Linear algebra

We used the Block Wiedemann implementation in CADO-NFS (we compiled commit d6962f667d3c... with MPI enabled), with parameters n = 10 and m = 20. Due to time constraints, we needed to minimize wall clock time so we chose to run the computation on 4 nodes, to reduce the iteration time for the Krylov sequences. Also, to avoid complications, we did not run the 10 Krylov sequences in parallel. The net result was that we spent 11,760 core hours on the Krylov step, which is suboptimal (but got us the result in time). It took 24 core hours (on one core) to compute the linear generator and 672 core hours for the solution step. This gave 2, 754, 009 of the factor base ideal virtual logarithms. We ran the log reconstruction to give a final total of 25, 215, 976 known virtual logarithms out of a possible total of 29, 246, 136 factor base ideals.

We note that a similar-sized matrix was solved in [11], which used 1,920 core hours for the Krylov step. However, due to our choice of smoothness bound, set to 2²⁸, our linear algebra effort to set the new record was considerably less than that of the previous record of 422 bits, which involved a Krylov step taking 23,390 core hours for a smoothness bound set to 2²⁹.

5.5 Individual logarithm

Take the element g=x+2∈Fp6=Fp[x]/〈 f0(x) 〉 . Let

ℓ=9589868090658955488259764600093934829209,

a large prime factor of p² − p +1. Let h = (p⁶ − 1) /ℓ. Note that g is not a generator of the entire multiplicative subgroup of Fp6, but we do have that g^h is a generator of the subgroup of size ℓ. It is easy to compute vlog(g) since N₀(g) = −33.

We have vlog(g) = 8951069617162908953536183274937613985265. We chose the target

t=314159265358979323846x5+264338327950288419716x4+939937510582097494459x3 +230781640628620899862x2+803482534211706798214x+808651328230664709384

We implemented the initial splitting algorithm of A. Guillevic [13] in SAGE, and after a few core hours found that

g74265t=uvw(−129592286880919x2−103570474976165x−5550010113050)

where u∈Fp2,v∈Fp3,w∈Fp, so that their logarithm modulo ℓ is zero. The norm of the latter term is −11 · 37 · 71 · 97 · 197 · 821 · 24682829· 33769709 · 83609989 · 13978298429383 · 21662603713879 · 74293619085767· 141762919001833 · 381566853770521. We had 5 special-q to descend, the largest having 49 bits. We used our 3d lattice sieve implementation to descend from these ideals of unknown log to factor base elements with known logarithms. This was a somewhat manual process and took about 8 hours.

Table 3

Comparison with other record computations in Fp6. All timings in core hours.

year	size of pⁿ	authors	algorithm	rel. col.	lin. alg.	total
2008	240	Zajac	NFS-HD	580	322	912
2015	240	HAKT	NFS-HD	527	-	-
2017	240	GGMT	NFS-HD	22	5	27
2017	300	GGMT	NFS-HD	164	39	203
2017	389	GGMT	NFS-HD	18,960	2,400	21,360
2017	422	GGMT	NFS-HD	201,600	26,880	228,480
2019	423	this work	NFS-HD	69,120	12,480	81,600

We obtained vlog(t) = 2619623637064116359346428467068287245870, so that

logg(t)≡vlog(t)/vlog(g)≡7435826750517015269718230402645557947880 mod ℓ.

6 Pairing break

Let p be the same prime as in the previous section. Define Fn2=Fp[i]/〈 i2+2 〉. The curve E/Fn2:y2=x3+b, b = i + 7 is supersingular of trace p, hence of order p² − p + 1. Define Fp6=Fp2[j]/〈 j3−b 〉. The embedding field of the curve E is Fp6. We take

G_{0}=(5,751568328314480688740 i+751642554083315688493)

and we check that G₁ = [273]G₀ is a generator of E(Fp2)[ℓ]. The distortion map ϕ:(x,y)↦( xp/(jb(p−2)/3), yp/(b(p−1)/2) ) gives a generator G₂ = ϕ(G₁) of the second dimension of the ℓ-torsion. We take the point

P0=(314159265358979323846 i+264338327950288419717, 1560320966141767888064 i+368067364535991558380)

from the decimals of π, and P1=[273]P0∈E(Fp2)[ℓ] is our challenge. We aim to compute the discrete logarithm of P₁ to base G₁. To do so, we transfer G₁ and P 1 to F p 6 , and obtain g=eTate (G1,ϕ(G1)) and t = e_Tate (P₁, ϕ(G₁)), or

t=709659446396572245219x5+760855550263311226560x4+459517758627469463106x3

+1075867962756498791880x2+966415406496231787507x+759380554536416832249,g=1445115464416256318145x5+608219705720308630653x4+1328213831161031326049x3 +104723931403852502861x2+1118264722333528462011x+551285267384030855316

The initial splitting gave a 50-bit smooth generator

g289236=uvw(−207659249318101x2−32084626907475x+36052674649889)

where u∈Fp2,v∈Fp3,w∈Fp, so that their logarithm modulo ℓ is zero. The norm of the latter term is 11 · 71 · 79 · 1453 · 433123 · 85478849 · 34588617703· 40197196124443 · 76694584420127 · 370667620290007 · 419573910884273 · 823157513981483. We had special-q to descend. We also got a 49-bit smooth challenge of norm 23 · 29² · 41 · 563 · 2917 · 1245103 · 12006859· 107347203833 · 506649149393 · 39018481981309 · 138780153403907 · 174514280440993 · 302260510161053:

g91260t=uvw(−59788863574984x2+62066870577408x+88384197770333)

We obtained vlog(g) = 7599151482912535295281621925658364195913 and

vlog(t) = 4642225023760573112152590887355819325364, so that

log_g(t) ≡ vlog(t)/vlog(g) ≡ 4325953856049730257332335443497115431763 mod ℓ.

7 Conclusion

We have presented a new approach to lattice sieving in higher dimensions for the number field sieve, together with a novel approach to avoiding inefficient memory access patterns which applies regardless of dimension. In addition, we implemented the 3d case of our idea and used it to set a record in solving discrete log in Fp6, a typical target in cryptanalysis of pairing-based cryptography, in time a factor of more than 2.5× better (in core hours) than the previous record, which was of a directly comparable size. It should be possible to improve the code further with more effort put into optimization. We have indicated that the sieving enumeration generalizes to higher dimensions as long as a certain integer linear programming problem is tractable. This has immediate implications for the possibility of implementation of the Tower Number Field Sieve and e.g. the Extended Tower Number Field Sieve, the latter of which is dependent on sieving in dimension at least four. The recent preprint [14] addresses one major prerequisite to the realization of TNFS and ExTNFS, concerning polynomial selection, while in the present work we give a strong indication that another obstruction, that of sieving efficiently in small dimensions of four and above, may be easier than first thought.

References

[1] Aoki, K., Ueda, H.: Sieving using bucket sort. In: Advances in cryptology—ASIACRYPT 2004, Lecture Notes in Comput. Sci., vol. 3329, pp. 92–102. Springer, Berlin (2004)10.1007/978-3-540-30539-2_8Search in Google Scholar

[2] Babai, L.: On Lovász’ lattice reduction and the nearest lattice point problem. Combinatorica 6(1), 1–13 (1986). 10.1007/BF02579403, https://doi.org/10.1007/BF0257940310.1007/BFb0023990Search in Google Scholar

[3] Barbulescu, R., Gaudry, P., Guillevic, A., Morain, F.: Improving NFS for the discrete logarithm problem in non-prime finite fields. In: Advances in cryptology—EUROCRYPT 2015. Part I, Lecture Notes in Comput. Sci., vol. 9056, pp. 129–155. Springer, Heidelberg (2015)10.1007/978-3-662-46800-5_6Search in Google Scholar

[4] Barbulescu, R., Gaudry, P., Joux, A., Thomé, E.: A heuristic quasi-polynomial algorithm for discrete logarithm in finite fields of small characteristic. In: Advances in cryptology—EUROCRYPT 2014, Lecture Notes in Comput. Sci., vol. 8441, pp. 1–16. Springer, Heidelberg (2014)10.1007/978-3-642-55220-5_1Search in Google Scholar

[5] Cohen, H.: A course in computational algebraic number theory, Graduate Texts in Mathematics, vol. 138. Springer-Verlag, Berlin (1993)10.1007/978-3-662-02945-9Search in Google Scholar

[6] Fincke, U., Pohst, M.: A new method of computing fundamental units in algebraic number fields. In: EUROCAL ’85, Vol. 2 (Linz, 1985), Lecture Notes in Comput. Sci., vol. 204, pp. 470–478. Springer, Berlin (1985)10.1007/3-540-15984-3_315Search in Google Scholar

[7] Franke, J., Kleinjung, T.: Continued fractions and lattice sieving. In: Workshop record of SHARCS (2005) (2005), available at http://www.ruhr-uni-bochum.de/itsc/tanja/SHARCS/talks/FrankeKleinjung.pdfSearch in Google Scholar

[8] Gaudry, P., Grémy, L., Videau, M.: Collecting relations for the number field sieve in GFp⁶ LMS J. Comput. Math. 19(suppl. A), 332–350 (2016)10.1112/S1461157016000164Search in Google Scholar

[9] Gras, M.N.: Special units in real cyclic sextic fields. Math. Comp. 48(177), 179–182 (1987)10.1090/S0025-5718-1987-0866107-1Search in Google Scholar

[10] Gremy, L.: Sieve algorithms for the discrete logarithm in medium characteristic finite fields. In: Ph.D. thesis, Universite de Lorraine (2017), available at https://tel.archives-ouvertes.fr/tel-01647623Search in Google Scholar

[11] Grémy, L., Guillevic, A., Morain, F., Thomé, E.: Computing discrete logarithms in 𝔽_p⁶ In: Selected areas in cryptography—SAC 2017, Lecture Notes in Comput. Sci., vol. 10719, pp. 85–105. Springer, Cham (2018)10.1007/978-3-319-72565-9_5Search in Google Scholar

[12] Guillevic, A.: Computing individual discrete logarithms faster in GFpⁿwith the NFS-DL algorithm. In: Advances in cryptology—ASIACRYPT 2015. Part I, Lecture Notes in Comput. Sci., vol. 9452, pp. 149–173. Springer, Heidelberg (2015)10.1007/978-3-662-48797-6_7Search in Google Scholar

[13] Guillevic, A.: Faster individual discrete logarithms in finite fields of composite extension degree. Math. Comp. 88(317), 1273–1301 (2019)10.1090/mcom/3376Search in Google Scholar

[14] Guillevic, A., Singh, S.: On the Alpha Value of Polynomials in the Tower Number Field Sieve Algorithm (Aug 2019), https://hal.inria.fr/hal-02263098 working paper or preprintSearch in Google Scholar

[15] Hayasaka, K., Aoki, K., Kobayashi, T., Takagi, T.: An experiment of number field sieve for discrete logarithm problem over gf p¹² JSIAM Letters (Jan 2013)10.1007/978-3-642-42001-6_8Search in Google Scholar

[16] Joux, A., Lercier, R., Smart, N., Vercauteren, F.: The number field sieve in the medium prime case. In: Advances in cryptology—CRYPTO 2006, Lecture Notes in Comput. Sci., vol. 4117, pp. 326–344. Springer, Berlin (2006)10.1007/11818175_19Search in Google Scholar

[17] Kannan, R.: Improved algorithms for integer programming and related lattice problems. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing. p. 193–206. STOC ’83, Association for Computing Machinery, New York, NY, USA (1983), https://doi.org/10.1145/800061.80874910.1145/800061.808749Search in Google Scholar

[18] Kim, T., Barbulescu, R.: Extended tower number field sieve: a new complexity for the medium prime case. In: Advances in cryptology—CRYPTO 2016. Part I, Lecture Notes in Comput. Sci., vol. 9814, pp. 543–571. Springer, Berlin (2016)10.1007/978-3-662-53018-4_20Search in Google Scholar

[19] Kim, T., Jeong, J.: Extended tower number field sieve with application to finite fields of arbitrary composite extension degree. In: Public-key cryptography—PKC 2017. Part I, Lecture Notes in Comput. Sci., vol. 10174, pp. 388–408. Springer, Berlin (2017)10.1007/978-3-662-54365-8_16Search in Google Scholar

[20] Pollard, J.M.: The lattice sieve. In: The development of the number field sieve, Lecture Notes in Math., vol. 1554, pp. 43–49. Springer, Berlin (1993)10.1007/BFb0091538Search in Google Scholar

[21] Sarkar, P., Singh, S.: A general polynomial selection method and new asymptotic complexities for the tower number field sieve algorithm. In: Advances in cryptology—ASIACRYPT 2016. Part I, Lecture Notes in Comput. Sci., vol. 10031, pp. 37–62. Springer, Berlin (2016)10.1007/978-3-662-53887-6_2Search in Google Scholar

[22] Schirokauer, O.: Discrete logarithms and local units. Philos. Trans. Roy. Soc. London Ser. A 345(1676), 409–423 (1993)10.1098/rsta.1993.0139Search in Google Scholar

[23] The CADO-NFS development team: Cado-nfs, an implementation of the number field sieve algorithm (2019), available at http://cado-nfs.gforge.inria.fr/Search in Google Scholar

[24] Zajac, P.: Discrete logarithm problem in degree six finite fields. In: Ph.D. thesis, Slovak University of Technology (2008), http://www.kaivt.elf.stuba.sk/kaivt/Vyskum/XTRDLSearch in Google Scholar

Received: 2020-02-25

Accepted: 2020-09-01

Published Online: 2020-11-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Lattice Sieving in Three Dimensions for Discrete Log in Medium Characteristic

Abstract

1 Introduction

2 Number Field Sieve

2.1 Lattice Sieving

3 Faster enumeration

Remark 1

Remark 2

Remark 3

Remark 4

Remark 5

Algorithm 1

3.1 Previous Lattice Enumeration Methods

3.2 Example

3.3 Integer Linear Programming

Remark 6

Algorithm 2

4 Dealing with Cache Locality Issues

Remark 7

Remark 8

Remark 9

Remark 10

5 Record computation in Fp6

5.1 Polynomial selection

5.2 Relation collection

5.3 Construction of matrix

5.4 Linear algebra

5.5 Individual logarithm

6 Pairing break

7 Conclusion

References

Journal and Issue

Articles in the same Issue