A trade-off between classical and quantum circuit size for an attack against CSIDH

: We propose a heuristic algorithm to solve the underlying hard problem of the CSIDH cryptosystem (and other isogeny-based cryptosystems using elliptic curves with endomorphism ring isomorphic to an imaginary quadratic order O ). Let ∆ = Disc( O ) (in CSIDH, ∆ = −4 p for p the security parameter). Let 0 < α < 1/2, our algorithm requires:


Introduction
Given two elliptic curves E 1 , E 2 defined over a finite field Fq, the isogeny problem consists in computing an isogeny ϕ : E 1 → E 2 , i.e. a non-constant morphism that maps the identity point on E 1 to the identity point on E 2 . A hash function construction based on supersingular isogeny graphs was first proposed in [9], with a security based on the hardness of computing isogenies. An isogeny-based key-exchange was described by Couveignes [12], and its concept was independently rediscovered by Stolbunov [31].
Childs, Jao and Soukharev observed in [10] that the problem of finding an isogeny between two ordinary elliptic curves E 1 and E 2 defined over Fq and having the same endomorphism ring could be reduced to the problem of solving the Hidden Subgroup Problem (HSP) for a generalized dihedral group. More specifically, if the endomorphism ring of the curves is isomorphic to an imaginary quadratic order O, then the problem of finding an isogeny between E 1 and E 2 can be reduced to the problem of finding an ideal a ⊆ O such that [a] * E 1 = E 2 where * is the action of the ideal class group Cl(O), [a] is the class of a in Cl(O) and E i is the iso-morphism class of the curve E i . Let N := | Cl(O)|. Using Kuperberg's sieve [25], this task requires 2 )︁ queries to an oracle that computes the action of the class of an element in Cl(O). Using the heuristic oracle of [4], the cost of the oracle can be brought down to 2Õ where N ≈ √︀

|∆|.
Although neither the CRS [12,31] nor the CSIDH (a similar system [8] using supersingular curves defined over Fp) cryptosystems are NIST candidates, it is natural to evaluate their security according to the methodology proposed by NIST for its standardization process [26]. In particular, Level I is defined in [26,Page 16] as follows: "any attack that breaks [this] security definition must require computational resources comparable to or greater than those required for key search on a block cipher with a 128-bit key (e.g. AES-128)." Hence, this corresponds to 2 128 classical AES evaluations (2 143 classical gates, according to the document) or to 2 87.5 quantum gates (with 2953 logical qubits), according to the counts given in [17] on the universal Clifford + T set. We point out that this "or" has no reason to be exclusive: a quantum adversary can also run massive classical computations.

Contributions.
We propose a different trade-off between classical and quantum circuits in the cryptanalysis of CRS and CSIDH relying on the resolution of the Hidden Shift Problem. Let E 1 , E 2 be two elliptic curves and O be an imaginary quadratic order of discriminant ∆ such that End(E i ) ≃ O for i = 1, 2. Then assuming Heuristic 1 for constant 0 < α < 1/2 and Heuristic 2, there is a quantum algorithm for computing [a] such that [a] * E 1 = E 2 requiring: -Polynomial classical and quantum memory.

Related Works.
After the publication of CSIDH, there has been a line of works on the quantum security of CRS and CSIDH. Some of these works concern the security of concrete CSIDH [8] parameters. These include [6] and [3], which give a quantum circuit for computing isogenies for the 512-bit CSIDH parameters. On the asymptotic side, which is our main focus here, both [4] and [19] present algorithms for computing isogenies with quantum (and classical) circuit size in 2Õ (︁ log(|∆|) 1/2 )︁ and polynomial space, which yields a subexponential quantum attack on CSIDH and CRS with polynomial quantum space. While these two previous works focused on isogeny computations, in this paper, we complement the analysis of the Hidden Shift resolution underlying the attack procedure common to all these works. With our trade-off, we can obtain a superpolynomial improvement on the size of the quantum circuit.
The rest of the paper is organized as follows: Section 2 contains background information on isogenies. Section 3 shows the connection between the Dihedral Hidden Subgroup Problem and the computation of isogenies. Section 4 give a high level description of the idea for the resolution of the Dihedral HSP. Section 5 introduces the concept of trading-off quantum gates for classical gates in the resolution of the Dihedral HSP. Section 6 Describes a heuristic oracle compatible with the intended trade-off. Section 7 discusses the heuristic made for the validity of the oracle. Section 8 describes the challenges of a fault-tolerant implementation. Section 9 concludes and discusses the relevance of this result to the evaluation of the security with respect to NIST security levels.

Mathematical background
An elliptic curve E defined over a finite field Fq of characteristic p ≠ 2, 3 is a projective algebraic curve with an affine plane model given by an equation of the form y 2 = x 3 + ax + b, where a, b ∈ Fq and 4a 3 + 27b 2 ≠ 0. The set of points of an elliptic curve is equipped with an additive group law. Details about the arithmetic of elliptic curves can be found in many references, such as [30,Chap. 3].
Let E 1 , E 2 be two elliptic curves defined over Fq. An isogeny ϕ : E 1 → E 2 over Fq (resp. over Fq) is a nonconstant rational map defined over Fq (resp. over Fq) which sends the identity point on E 1 to the identity point on E 2 . The degree of an isogeny is its degree as a rational map, and an isogeny of degree ℓ is called an ℓ-isogeny. Moreover, E 1 , E 2 are said to be isomorphic over Fq, or Fq-isomorphic, if there exist isogenies ϕ 1 : E 1 → E 2 and ϕ 2 : E 2 → E 1 over Fq whose composition is the identity. Two Fq-isomorphic elliptic curves have the same j-invariant given by j := 1728 4a 3 4a 3 +27b 2 . An order O in a number field K such that [K : Q] = n is a subring of K which is a Z-module of rank n. Let E be an elliptic curve defined over Fq. An endomorphism of E is either an isogeny defined over Fq between E and itself, or the zero morphism. The set of endomorphisms of E forms a ring that is denoted by End(E). For elliptic curves, End(E) is either an order in an imaginary quadratic field (and has Z-rank 2) or a maximal order in a quaternion algebra ramified at p (the characteristic of the base field) and ∞ (and has Z-rank 4). In the former case, E is said to be ordinary while in the latter it is called supersingular. When a supersingular curve is defined over Fp, then the ring of its Fp-endomorphisms, denoted by End Fp (E), is isomorphic to an imaginary quadratic order, much like in the ordinary case.
When E is ordinary (resp. supersingular over Fp), the class group of End(E) (resp. End Fp (E)) acts transitively on isomorphism classes of elliptic curves having the same endomorphism ring. More precisely, the class of an ideal a ⊆ O acts on E with End(E) ≃ O via an isogeny of degree N(a) (the algebraic norm of a). Likewise, From an ideal a and the ℓ-torsion (where ℓ = N(a)), one can recover the kernel of φ, and then using Vélu's formulae [34], one can derive the corresponding isogeny. We denote by [a] * E the action of the ideal class of a on E. To evaluate the action of [a], we decompose it as a product of classes of prime ideals of small norm ℓ, and evaluate the action of each prime ideal as an ℓ-isogeny. This strategy was described by Couveignes [12], Galbraith-Hess-Smart [15], and later by Bröker-Charles-Lauter [7] and reused in many subsequent works.

Isogenies from solutions to the HSP
As shown in [5,10], the computation of an isogeny between E 1 and E 2 defined over Fq such that there is an imaginary quadratic order with O ≃ End(E i ) for i = 1, 2 can be done by exploiting the action of the ideal class group of O on isomorphism classes of curves with endomorphism ring isomorphic to O. This concerns the cases of ordinary curves, and supersingular curves defined over Fp.
Assume we are looking for a such that [a] * E 1 = E 2 . This is precisely the hard mathematical problem of the CSIDH [8] and CRS [12,29] The computation of ⃗ s can thus be done through the resolution of the Hidden Subgroup Problem in Z 2 A.

Algorithm 1 Quantum algorithm for evaluating the action in Cl(O)
Additionally, we assume that N = 2 n for simplicity. Using a circuit implementing f , we can prepare the state |ψ d,N k ⟩ := 1 )︁ . We want to recover d from many states When we restrict ourselves to N = 2 n , this task consists in recovering d bit by bit. To get the least significant bit of d, we only need |ψ d,2 n 2 n−1 ⟩ = 1 As shown in [24], the repetition of this process yields all bits of d. When N is not a power of 2, the process is terminated with a quantum phase estimation step.
To go from many |ψ d,N k ⟩ with random k to |ψ d,2 n 2 n−1 ⟩, Kuperberg's sieve [24] proceeds by small iterations. Given two states |ψ d,N k1 ⟩, |ψ d,N k2 ⟩ where k 1 , k 2 share the same initial l bits, there is a simple procedure that computes |ψ d,N k1−k2 ⟩ with constant probability, thus killing l bits of the decomposition of the index k. At the end of the process we end up with states of the form |ψ d,2 n 2 n−1 ⟩ and |ψ d,2 n 0 ⟩. As we saw above, the latter gives us the least significant bit of d. In CSIDH, Cl(O) is cyclic with high probability, but this applies to non-cyclic groups [10, Appendix A]. Here, we consider the HSP in D N with N = 2 n .

Low memory variants
The main disadvantage of Kuperberg's sieve is that the memory requirements are proportional to the gate complexity, which is in 2 O( √ n) . That is a subexponential space complexity. Regev's variant [27] offers a classical and quantum polynomial space complexity at the cost of a slight increase of the runtime. The idea is to only keep a polynomial amount of qubits at all time and to recombine to produce states of the form |ψ d,N k ⟩ with initial bits of k being zero. Kuperberg also described a second Hidden Shift algorithm [25] that uses a different combination method. It has also a time cost in 2 O( √ n) , and uses only a polynomial amount of qubits.
It however has a classical memory requirement as large as the classical time.

Trade-off classical/quantum
Regev's variant of Kuperberg's sieve can be seen as an n 1 -step process which is paused at each step to perform a classical brute-force enumeration of cost 2 O(n2) . Instead of balancing the classical and quantum effort, we propose to spend more effort performing the classical search to reduce the size of the quantum circuit. Let n ≈ n 1 n 2 , with n 1 = O (︀ n α )︀ and n 2 = O (︀ n 1−α )︀ for some 0 < α < 1. The case α = 1/2 is essentially Regev's variant [27].
Algorithm 2 Iteration of the sieve procedure based on [27] Input: Integers n 1 , n 2 and n 2 + 4 states of the form |ψ d,N k i ⟩ for random k i having their initial tn 2 bits equal to 0. Output: |ψ d,N k ⟩ for a random k having its initial (t + 1)n 2 bits equal to 0. 1: ⃗ k ← (k 1 , . . . , k n2+4 ).
8: return |ψ⟩. Proof. As long as n 2 → ∞, the main ingredients of the proof of the validity and run time of [27] still hold. Namely, a direct application of Chebyshev's inequality shows that Step 5 (and therefore Algorithm 2) has a constant probability of success. Following the approach of [27], the algorithm to solve the HSP consists in the production of states |ψ d,N k ⟩ for random k with an oracle implementing f , and 2 n1 successive applications of Algorithm 2 to produce |ψ d,2 n 2 n−1 ⟩. An application of the Chernoff bound shows that the number of calls to the oracle implementing f that guarantees the success of the overall procedure is n O(n1) The quality of the trade-off depends on the cost of the oracle. Indeed, if the quantum circuit to implement the oracle f is larger than 2Õ (n α ) for the chosen α, then the size of the circuit to implement f will dominate the number of quantum gates. This issue particularly impacts the resolution of the isogeny problem between elliptic curves whose endomorphism ring is isomorphic to an imaginary quadratic order (i.e. ordinary curves and supersingular curves defined over Fp).

The cost of the isogeny oracle
on E 1 by applying repeatedly the action of the p i for i = 1, . . . , u.
Step 1 is a precomputation. It takes quantum polynomial time.
Step 2 can be performed as a precomputation requiring only classical gates. Lemma 6.1. Let L be an n-dimensional lattice with input basis B ∈ Z n×n , and let β < n be a block size. Then the BKZ variant of [18] used with Kannan's enumeration technique [22] (1) ) λ 1 (L) , using time Poly(n, Size(B))β β ( 1 2e +o (1) ) and polynomial space.

Corollary 6.2. Assuming Heuristic 1 for α, Algorithm 3 is correct, runs in time 2Õ ( log(|∆|) 1−2α ) and has polynomial space complexity. It returns a basis of L whose first vector
We implement Algorithm 4 reversibly by using generic techniques due to Bennett [2] to convert any algorithm taking time T and space S into a reversible algorithm taking time T 1+ϵ , for an arbitrary small ϵ > 0, and space O(S log T). To bound the cost of Algorithm 4, we assume the following standard heuristic.
Proof. Each group action of Step 7 is polynomial in log(p) and in N(p i ). Moreover, Babai's algorithm runs in polynomial time and returns ⃗ u such that Therefore, the y i are in 2Õ ( log(|∆|) α ) , which is the cost of Steps 5 to 9. The main observation allowing us to reduce the search to a close vector to the computation of a BKZ-reduced basis is that Heuristic 1 gives us the promise that there is ⃗ u ∈ L at distance less than 2Õ ( log(|∆|) α ) from ⃗ y.

Algorithm 5
Hybrid algorithm for finding the group action.

Discussion on Heuristic 1
The idea behind Heuristic 1 is that the number of vectors of length log(|∆|) 1−α with entries bounded by e log(|∆|) α is |∆| while | Cl(O)| ≈ √︀ |∆|. If the class of ∏︀ i p x i i yielded by a vector ⃗ x were known to be distributed uniformly at random in Cl(O), then we would cover all of Cl(O) with high probability. Unfortunately, the distribution of the classes of these ideals is not known (unless we consider products over the first log(|∆|) 2+ε split primes [20], but this is incompatible with our restriction on α). To support Heuristic 1, we drew 5000 elements of Cl(O) for various O of increasing discriminant. At each discriminant size, we report the maximal exponent in the decomposition of the random classes with respect to the fist log(|∆|) 1−α split primes. We systematically observe that it is significantly lower than e log(|∆|) α . In Table 1, we present the evolution of the maximal exponent for α = 0.4 and Disc(O) = −p for p the first prime greater than 2 i such that −p is a fundamental discriminant and i between 35 and 160. In Appendix A we present similar results for α = 0.1, . . . , 0.5 and smaller increments in the size of ∆. Heuristic 1 intersects ongoing research in number theory, and it is a motivation for more study on the structure of the class group. The samples presented in this paper are admittedly low, but they support the fact that Heuristic 1 holds true more than 98% of the time (at least for the sizes of ∆ that were inspected). Such a success rate makes Heuristic 1 relevant for discussions within the field of cryptography.

On fault tolerant implementations
All the asymptotic results regarding the proposed trade-off between classical and quantum circuits only apply to logical qubits. If we incorporate the cost of error correction, then the quantum circuit has to idle while the classical circuit searches for the number m of vectors ⃗ b ∈ {0, 1} n2+4 such that ⟨ ⃗ b · ⃗ k⟩ mod 2 n2 = z.
The logical gate representation of this circuit does not include the cost of idling, but in all realistic models of fault tolerant qubits, operations need to be performed on a qubit that is being stored while the classical computation is being done. There is currently an ongoing debate in the cryptographic community as to how to assign a cost-metric to a quantum algorithm given its representation in the logical quantum circuit-model of computation [3,21]. One approach is the quantum circuit-size and the other is the product of the quantum circuit-width (#qubits) and the quantum circuit-depth (time taken). We have previously studied our tradeoff in light of the circuit-size metric. We now briefly make some remarks with regards to the latter, which is proposed as it captures the difficulties in performing quantum error-correction. Regardless of the architecture chosen for quantum computers and method used to perform quantum error-correction, it is clear from theoretical error models regarding physical qubits that if we consider discrete timesteps, then applying single or two-qubit gates induce an error in the qubit with a significantly higher probability than if it were simply resting (or "idling") [13,14,23,28,33]. As the resources we must expend on error-correction is intrinsically linked to the probability of an error occuring, it is plain that the resources to protect an idle quantum state have the potential to be lower than those required to protect a quantum state undergoing active manipulation. For one example of the proposed gaps and tradeoffs that can exist for different architectures, see [32,Tab 2]. In Table 2, we observe that the error rate while storing a qubit is lower than when applying gates in most system. Furthermore, classical gates could be significantly faster in practice than quantum gates, thus reducing the quantum cost of idling. In fact, most recent resource estimations [1] can show that, given the current trajectory of quantum architectures, a quantum computation requires inherently a corresponding amount of classical computations. From the counts in [16] a Grover search for an AES-128 key requires 2 106 classical computations, hence approximately 2 20 classical computations per quantum gate.
Our tradeoff therefore allows for agility in cryptanalysis depending upon the eventual architecture of quantum computers and opens the door for improvements and further tradeoffs if smarter methods of performing the brute-force enumeration step are discovered. A simple example of a further trade-off would be to employ parallelism in this stage so that if m classical processors are available, then the classical time would be proportional to 2 O(n 1−α ) /m + O(m), thus reducing the time of quantum idling even more. A full examination of this work under current projections involving quantum error-correction is left for future work.

Conclusion
We proposed an asymptotic trade-off between the size of the classical and quantum circuits required to attack CSIDH. This angle is motivated by the fact that to use the full power of the NIST metric, we should authorize 2 128 classical computations and 2 87.5 quantum gates simultaneously. This work showed that such a hybrid attack could be performed with a quantum and a classical circuit that are both asymptotically smaller than the state-of-the-art. The study of the impact of this attack against the parameters for a specific security level (ex: Level I) is left for future work. In the case of CSIDH-512, the number of Clifford + T gates required to run a reversible CSIDH isogeny computation has been estimated in [3] to approximately 2 51 . This is costly, but if we adjust α such that log(|∆|) 1−α ≈ 128 for log(|∆|) = 512 (since log(|∆|) ≈ log(p) where p is the security parameter), we get α ≈ 0.22. Then log(|∆|) α ≈ 4, which indicates that the size of the quantum circuit besides oracle calls might be moderate, thus leaving the door open for the relevance of our algorithms to the analysis of the NIST Level I security of CSIDH.

A Numerical data in support of Heuristic 1
In this section, we provide additional numerical data in support of the heuristic made in Section 7. For each i in 30, 35, . . . , 160 and α = 0.1, . . . , 0.5, we select the first prime p ≥ 2 i such that ∆ = −p is a fundamental discriminant. For each discriminant, we compute the corresponding ideal class group and produce a reduced basis of the lattice of relations between the classes of the split primes p i of norm less than log 1−α (|∆|). Then we draw 5000 ideal classes uniformly at random and compute a short decomposition over the split primes of norm less than log 1−α (|∆|). To compute a short decomposition of [a], we solve an instance of the approximate Closest Vector Problem between a vector ⃗ x such that [ ∏︀ i p i ] = [a] and the lattice L of relations. We solve approximate CVP by reducing the basis of L with the BKZ algorithm and calling Babai's nearest plane algorithm. We do not necessarily find the shortest ⃗ x, however, all our exponents are below the intended bound e log α (|∆|) . In each table, we show the largest exponent occurring in a decomposition next to e log α (|∆|) for each ∆. Our heuristic is systematically satisfied. Moreover, aside from the case α = 0.1 where the intended bound is already very small (between 4 and 5), we observe that our heuristic seems in fact very conservative. For example, for log 2 (|∆|) = 160 and α = 0.5, the maximal exponent recorded over 5000 short decompositions is 188 while the intended bound is e log 0.5 (|∆|) = 37462.   30  15  2  4  35  18  2  4  40  20  2  4  45  22  2  4  50  24  2  4  55  26  2  4  60  29  2  4  65  31  2  4  70  33  3  4  75  35  3  4  80  37  3  4  85 Table 6: Maximal exponent in short decompositions (over 5000 random elements of the class group).