A subexponential-time, polynomial quantum space algorithm for inverting the CM group action

Abstract We present a quantum algorithm which computes group action inverses of the complex multiplication group action on isogenous ordinary elliptic curves, using subexponential time, but only polynomial quantum space. One application of this algorithm is that it can be used to find the private key from the public key in the isogeny-based CRS and CSIDH cryptosystems. Prior claims by Childs, Jao, and Soukharev of such a polynomial quantum space algorithm for this problem are false; our algorithm (along with contemporaneous, independent work by Biasse, Iezzi, and Jacobson) is the first such result.


Introduction
In recent years, isogeny-based cryptosystems have emerged as a possible candidate for post-quantum cryptography. The earliest isogeny-based key agreement protocol, first proposed by Couveignes [9] and later by Rostovtsev and Stolbunov [18], uses the complex multiplication action of an imaginary quadratic ideal class group cl(O) on an ordinary elliptic curve E(Fq); we refer to this scheme as CRS. Very recently, a new isogenybased proposal called CSIDH [4] has appeared, which is essentially equivalent to CRS except that it uses supersingular elliptic curves, and offers much faster performance. We emphasize that, in terms of security analysis, CRS and CSIDH are completely different from the supersingular case, which was first proposed for use in cryptography by Charles, Goren, and Lauter [5].
Both CRS and CSIDH can be broken (in the sense of a total break -recovery of the private key from the public key) by solving the group action inverse problem [20] on the complex multiplication group action, where the group in question is cl(O). The first published quantum algorithm implementing this attack is the CJS algorithm [7], which breaks CRS and CSIDH in quantum subexponential running time, specifically in Lp( 1 2 , 1 √ 2 ) operations. The CJS attack also requires a subexponential amount of space, and of greater concern is that the quantum part of the algorithm also requires this much space. Since the history of this topic is convoluted, we present it in detail. The CJS attack consists of two parts: a classical algorithm (subexponential in time and space) to evaluate the complex multiplication action, and a quantum algorithm by Kuperberg to solve the di-hedral hidden subgroup problem. The original version of Kuperberg's quantum algorithm [13] required both subexponential time and subexponential space. Later improvements by Regev [17] and Kuperberg [14] reduced the space requirement to polynomial space. Based on these improvements, CJS [7,Remark 4.6] claimed that the CJS attack could be performed using polynomial quantum space. However, as pointed out by Galbraith and Vercauteuren [11, §7.0.1], this claim is incorrect, because the classical subexponential algorithm to evaluate the complex multiplication action must be run in quantum superposition, and hence requires subexponential quantum space. The CJS authors have acknowledged this error in an errata on arXiv [6].
In this paper, we present a new algorithm for this problem which really requires only polynomial quantum space. Our algorithm uses a purely classical precomputation costing subexponential time and subexponential space, in which subexponentially slow lattice reduction algorithms are used to obtain a (polynomially long) advice string allowing any element of the class group to be expressed explicitly as a product of polynomially large primes with subexponentially large exponents. This precomputation then allows the complex multiplication action of cl(O) to be computed (both classically and quantumly) using subexponential time but only polynomial space, which is the ingredient needed for a polynomial quantum space group action inverse algorithm.
Couveignes [9], Stolbunov [20], and subsequent authors [10] have all specifically mentioned the use of LLL [15] for the purpose of evaluating the CM action. However, the use of LLL results in a fully exponential approximation factor, which is not good enough for our application. Our results are based on a variant of BKW [2] instead of LLL, and constitute (along with [1]; see below) the first detailed description of how to evaluate the complex multiplication operator in quantum subexponential time using only polynomial quantum space.

Related work
Recent work by Bonnetain and Schrottenloher [3] makes a detailed analysis of the security of CSIDH, by assessing the effectivenes of using lattice reduction algorithms, in particular BKZ [19], for the evaluation of the action of the class group. Their emphasis is on practical attacks rather than theoretical analysis. Independent work of Biasse et al. [1] describes a quantum algorithm to evaluate the action, also requiring subexponential time and polynomial quantum space. Moreover, [1] presents a further time-space tradeoff to evaluate the action using only polynomial classical space. Compared to their work, our work provides a more self-contained description of the quantum processing steps.

Preliminaries
There is more than one way to evaluate the action of the ideal class group in practice, and most methods use some form of randomness. Since we will be implementing this action on a quantum computer, we choose one in particular [4, §8, Algorithm 2] which can most easily be modified to set randomness beforehand. Note that the ideals l i = (ℓ i , π − 1) have kernel in Fp while the kernel of l i = (ℓ i , π + 1) is typically defined over Fp but has some elements lying in F p 2 /Fp. Given two probability density functions χ 1 , χ 2 over a finite set X, their statistical distance is defined as ∑︀ x∈X |χ

Smooth expression of generators of the class group
Suppose cl(O) is a cyclic group (cf. Section 3.1) of size N and with generator g. We describe a procedure to find, for each positive integer j, an expression g 2 j = ∏︀ t i=1 l e i i , with |e i | sub-exponential with respect to log N. Here the l i , i = 1, . . . , t denote the first t ideals lying over Elkies primes. The idea of the algorithm is to find enough samples of the form g k = ∏︀ t i=1 l e i i where the e i are chosen at random and sub-exponentially large, and then use a BKW-like algorithm to express g 2 j as a subexponentially short product of the samples g k .

Restriction to the cyclic case
We assume the class group cl(O) is cyclic. In general, the class group is not always cyclic, but heuristically it is cyclic in the vast majority of cases (97% of the time per Cohen-Lenstra [8]), and this case is easier to analyze. We conjecture that standard techniques such as [7, Appendix A] could be used to extend to the non-cyclic case.

Expander graphs
It is known [12, Theorem 3.2] that isogeny graphs for (isomorphism classes of) elliptic curves with complex multiplication by some imaginary quadratic order O ∆ where the edges are all isogenies with prime degree less than some fixed bound (log |∆|) B are in fact expander graphs. The following well-known result about expander graphs then tells us about the distribution of elliptic curves chosen from this set by taking short random walks. When creating the initial state (cf. Section 4.
a i e i . The above lemma about expander graphs tells us that these k's are chosen nearly uniformly at random from the range {1, . . . , N}. Notice that keeping the values of e i bounded by this polynomial in log N still yields subexponentially many samples for k. Lastly we use a version of the BKW where the exponents are subexponential in log N. In the next section we describe this algorithm in the general case.

A BKW-like algorithm
The following two lemmas describe the iterative part of the BKW algorithm. The idea of both algorithms is to take as input a collection of uniformly chosen positive integers bounded by N ≈ 2 n 2 , and reduce the number of non-zero coefficients of their expression in base 2 n .
For our purposes, we assume that the input of these algorithms is drawn from the uniform distribution [see subsection 3.2]. Suppose that one of the compression algorithms is called on an input a whose entries are sampled uniformly random from [0, . . . , 2 kn −1]. Notice that for any i ∈ [0, . . . , 2 n−1 ], the expected cardinality of B i is (c + 1); therefore the expected value of c i = max{a : (a, v) ∈ B i } is at least c+1 c+2 2 n . This implies that the statistical distance of the distribution c i − a and uniform is 2(1 − c+1 c+2 ). By summing over all i, the expected statistical distance of the output distribution and uniform is at most 2 n+1 ( 1 c+2 ). Now our aim is to write 2 ℓ , for ℓ ∈ [0, . . . , n 2 − 1], as a short linear combination of the given samples. The idea is to write ℓ = nq + r, and call the lower compression algorithm q times and the upper compression algorithm n − q + 1 times, to obtain samples of the form a2 nq , and find a = 2 r among the samples.
and sets a (i+j) = (b (i+j) 1 , . . . , b (i+j) m (i+j) ). By lemmas 3.2 and 3.3, the length m (n−1) of b (n−1) is (2 2n − n + 1)2 n , and its entries are of the form a2 nq , for a ∈ [0, . . . , 2 n − 1]. Moreover, following the dicussion above, the distribution of a in this set is statistically close to uniform; therefore we can find 2 ℓ with high probability. Without loss of generality assume a (n−1) 0 = 2 ℓ ; then by definition we have that 2 ℓ is written as a difference of two entries of a (n−2) . Following this recursively, after n − 1 steps we can find 2 ℓ as a linear combination of 2 n (possibly repeated) entries of a. Hence the largest coefficient of the linear combination is bounded by 2 n . Each of the compression steps takes O (2 3n ) in time and 2 3n in space. Therefore the overall complexity is O((n − 1)2 3n ) in time and space.

Quantumly Instantiating the Action of cl(O) in Polynomial Space
Since [4, Alg. 2] for computing the action of cl(O) on Eℓℓp(O) is not amenable to being instantiated quantumly, we present a modified algorithm here. While [4, Alg. 2] succeeds with probability 1 but has variable time, our algorithm has (tunable) fixed time but succeeds with (tunable) probability less than 1.
To begin, we give an algorithm for computing E B = l ±1 * E A for prime ℓ. We emphasize that this (classical) algorithm is designed with translation to a quantum algorithm-rather than efficiency-in mind.
Algorithm 1 succeeds if and only if there is i * ∈ {1, 2, . . . , r} such that 1. (︂ For uniformly random x, these conditions hold with probability 1 2 and ℓ−1 ℓ , respectively, since E( Thus the total probability that Algorithm 1 succeeds is 1 − Later we shall choose a value of k so that our final quantum algorithm succeeds with sufficient probability.
Next we build upon Algorithm 1 to construct an algorithm which computes E B = l ±e * E A for e ∈ N. It is easy to see that Algorithm 2 succeeds with probability Algorithm 1 A classical algorithm for computing l (−1) s * E A for prime ℓ, suitable for implementing on a quantum computer. Input: A ∈ Fp, and s ∈ {0, 1} Output:  , and e ∈ N t Output: B ∈ Fp such that l (−1) s Finally, Algorithm 3 computes l ±e1 1 l ±e2 2 · · · l ±et t * E A .
. From here we briefly describe how to instantiate this quantumly. First we describe the quantum instantiation of Algorithm 1. For brevity of notation we consider an input |s⟩|E A ⟩|0⟩ which is not in superposition, but of course the algorithm extends linearly to superpositions. Before the quantum part of the algorithm begins, we sample x 1 , x 2 , . . . xr ← U(Fp) classically and include them as part of the initial state. We will use them in the quantum instantiation of all three algorithms.
(i) In the notation of Algorithm 1, write (Q i ) r i=1 to a new register to obtain (ii) In the notation of Algorithm 1, define . . wr to a new register to obtain  [21] conditioned on w i = 1 and c = 0, and increment c conditioned on w i = 1. Essentially, we look down the list of points until we find x i * which is "appropriate" (has w i = 1) and, when we find the first appropriate point, we set a flag which indicates that we have computed l * E A , and so we should not compute more. The resultant state is whereĉ counts the number of w i which are equal to 1. (iv) Uncomputeĉ by subtracting 1 from it conditioned on w i = 1, for i = r, r − 1, . . . , 1. Then uncompute (Q i ) r i=1 and (w i ) r i=1 by reversing the circuit of step (ii). Rearranging the registers, the final state is For fixed ℓ, call the algorithm above Q (1) ℓ . We shall use it is a subroutine in the quantum instantiation of Algorithm 2. For input |s⟩|e⟩|E A ⟩|x 1 , x 2 , . . . , xr⟩|0⟩: (i) Conditioned on register 2 being positive, apply Q (1) ℓ to registers 1, 3, and 4 targetting a new register. Then decrement register 2, yielding (ii) Swap registers 2 and 5, and conditioned on register 2 being positive, apply Q (1) ℓ again to registers 1, 2, 3, and 4 targetting a new register. Decrement register 2 to obtain swap registers 3 and 6, and apply the Pauli X to register 1. This yields ℓ to registers 1, 2, 3, and 4 targetting register 6 conditioned on register 2 being positive. Notice that the output of Q (1) ℓ in this case is l (−1) s (z−1) * E A since we are applying l −(−1) s in this case. This effectively erases the contents of register 6. We can then apply Pauli X to register 1 again, and, conditioned on register 2 being positive, we apply Q (1) ℓ and decrement register 2 to obtain |s⟩|e − (z + 1)⟩|l (−1) s z * Copy register 6 onto a new register to obtain From here, we can simply reverse the iterations of step (iii) and steps (ii) and (i), erasing the anciliary registers. Rearranging registers yields where L i ≥ e i . It is easy to see that this computes the correct value.

Remark 4.1.
When we want to compute the action in superposition, we need to apply Q (L) ℓ for L greater than all the e values supported in the superposition. For unknown states, this is not possible, but for our purposes it suffices to be able to compute the action for known superpositions; we can then choose L appropriately.

Constructing the States
In this subsection we show how to use the algorithms previously described to construct the states required to apply Kuperberg's algorithm.
Using the method described in Section 3 we can construct a table From this table, we construct the following quantum circuit, which converts from "cyclic notation" g m to "prime decomposition notation" l v1 1 l v2 2 · · · l vt t : ]. This circuit can be implemented in polynomial space using standard techniques. Using this circuit and the one from Section 4.1, we can give a complete algorithm for constructing the states |ψ k ⟩.
(i) Construct the state |Ψ 0 ⟩ = (2N) − 1 2 ∑︀ N−1 m=0 |m⟩|0⟩|0⟩ + |m⟩|1⟩|0⟩. (ii) Apply C to the first and third registers above to obtain (iii) Apply the gate |0, y⟩ ↦ → |0, y ⊕ E B ⟩, |1, y⟩ ↦ → |1, y ⊕ E A ⟩ to the second and fourth registers above to obtain (iv) Apply the class group action gate to registers three, four, and five. If it is successful for each m, the resultant state is (v) Measure the fifth register. This gives a curve E C , and the state will collapse to and a satisfies g a * E A = E B . (vi) Apply C again to uncompute the v-values, and discard the (now empty) third register to obtain |Ψ 5 ⟩ = 2 − 1 2 (|m⟩|0⟩|E B ⟩|0⟩ + |m + a⟩|1⟩|E A ⟩|0⟩) for unknown random m. (vii) Apply the Quantum Fourier Transform over Z/NZ to the first register to obtain (viii) Measure the first register to get a uniformly random k and the state (ix) Uncompute E A and E B using the gate from part (iii), and discard auxiliary qubits to yield as required.

Remark 4.2.
In step (iv) we evaluate the class group action on a uniform superposition over cl(O) × Z/2Z (with the second coordinate determining to which curve we apply the element of cl(O)) with fixed randomness for each such input, using the prime decomposition presentation of the group elements. We find that the probability of evaluating the function correctly over the entire superposition is

Using the States to Find the Hidden Shift
Now that we have a method to obtain states of the form |ψ k ⟩ for uniformly random k, we can apply Regev's sieve [16] y=0 ω ay |y⟩, and F N |a⟩ = N − 1 2 ∑︀ N−1 y=0 ω ay |y⟩ where F N is the quantum Fourier transform. Since these states have inner product 2 k N = Ω (1) and this inner product is preserved by the inverse Fourier transform, it follows that measuring F † N ⨂︀ k−1 j=0 |ψ 2 j ⟩ in the computational basis will yield a with probability Ω(1).

Time and Space Analysis
We briefly explain the time and space analysis of the algorithms. The classical portions of the algorithm are: (i) Generating a subexponential number m of samples ∑︀ t i=1 a i e i = k, for small random e i and known a i = log g l i .
(ii) Using a BKW-like algorithm to find an expression ∑︀ m j=1 s j k j = 2 ℓ , with s j subexponential and k j from the given samples above. Its time and space complexity is O(2 3⌈ ) and space O(log 2.5 N log p).
(iv) Each sample in Regev's algorithm [16] invokes Algorithm 3 once, and so all our calls to Algorithm 3 take total time 2 O( √ log N log log N) . The space complexity is O(log 3 N log p √︀ log log N).