New Techniques for SIDH-based NIKE

Abstract We consider the problem of producing an efficient, practical, quantum-resistant non-interactive key exchange (NIKE) protocol based on Supersingular Isogeny Diffie-Hellman (SIDH). An attack of Galbraith, Petit, Shani and Ti rules out the use of naïve forms of the SIDH construction for this application, as they showed that an adversary can recover private key information when supplying an honest party with malformed public keys. Subsequently, Azarderakhsh, Jao and Leonardi presented a method for overcoming this attack using multiple instances of the SIDH protocol, but which increases the costs associated with performing a key exchange by factors of up to several thousand at typical security levels. In this paper, we present two new techniques to reduce the cost of SIDH-based NIKE, with various possible tradeoffs between key size and computational cost.


Introduction
The Supersingular Isogeny Diffie-Hellman (SIDH) protocol [10,15] is a promising candidate for quantumresistant key exchange. The protocol functions analogously to classical Diffie-Hellman, but using supersingular elliptic curves and cyclic subgroups instead of group elements and exponents. That is, one starts with a "base curve" E, Alice and Bob pick private cyclic subgroups A ⊂ E and B ⊂ E, and they each compute the "quotient curves" E/A and E/B for use in their respective public keys. To facilitate computation of the shared secret, Alice and Bob's public keys also contain additional information about the quotient maps ϕ A : E → E/A and ϕ B : E → E/B. Using this information, Alice and Bob then complete the protocol by computing a shared secret derived from an isomorphism invariant of the curve E/(A + B). SIDH security is based on a special case of the supersingular isogeny problem, which was first proposed for use in cryptography in [6]; as explained in [6, §5.3.1], this problem in turn was first introduced in [13]. We refer to [8] for a discussion of these hardness assumptions and their historical context.
Given the similar dataflow to the ordinary Diffie-Hellman protocol, it was at one time hoped that the SIDH construction would be a promising candidate for a static-static or non-interactive key exchange (NIKE) protocol. However, Galbraith, Petit, Shani, and Ti [14] showed that it was possible to use the additional information about ϕ A and ϕ B provided in the public keys to perform an active attack capable of recovering Alice and Bob's private keys. Prior work of Azarderakhsh et al. [2] shows that one can prevent the GPST attack and obtain a NIKE from SIDH by applying an expensive generic transformation, as follows. Suppose that Alice generates α public keys and Bob generates β public keys, where α and β are positive integers. Then Alice and Bob may perform a total of αβ key exchanges -one for each pair of public keys -and take their shared secret to be a hash of the concatenation of all of them. If a malicious attacker (say, Bob) presents an honest Alice with a malformed public key, then a total of α secret curves are potentially affected. To extract information about Alice's public keys from the hash computed by Alice, the attacker must know what input produced the hash, and so must search through all possible modifications of the α affected secret keys and try the possible hash values until they obtain a collision. If each secret curve can take on r possible values (say all occurring with equal probability, for simplicity, although the situation in practice is in fact more complicated) then the attacker must search through a space of r α possibilities, which requires exponential work if α is taken to be large enough. In [2], this construction is referred to as k-SIDH.
For 128-bit post-quantum security, Azarderakhsh et al. recommend α = 113 and β = 94 for standard SIDH parameters (the asymmetry arises because ϕ A and ϕ B are different), resulting in a total of 113 · 94 = 10622 key exchanges. In general, key size is proportional to α and β and scales linearly with security level, and computational cost is proportional to αβ and scales quadratically with security level.
In this paper, we significantly improve this state of affairs in two ways. The first approach is to modify the k-SIDH construction using extra automorphisms in a way that greatly increases the likelihood of obtaining malformed secret keys, allowing us to decrease the values of α and β. Using this approach, the computational cost remains quadratic, but with much smaller constants. The second approach is to devise new zero-knowledge proofs, based in part on our first improvement, to validate SIDH public keys and thus resist GPST-style attacks. Our second approach has linear cost overhead and hence is asymptotocally more costefficient, but requires larger (though still linearly scaling) key sizes.
We believe that our contributions likely have additional applications other than NIKE, although we do not pursue them here. Our first contribution, using non-trivial automorphisms to produce non-isomorphic isogenies between isomorphic curves, might be useful for performance improvements, similar to how some variants of GLV use extra low-degree endomorphisms to speed up point multiplication [17]. Our second contribution, on zero-knowledge proofs of validity for SIDH keys, may be useful for other authentication protocols such as digital signatures.

Related work
The recently proposed CSIDH protocol [5] is an alternative isogeny-based cryptosystem which seems to be especially well-suited to the NIKE setting. Under the original parameter choices and security analysis in [5], CSIDH-based NIKE is both faster and more compact than SIDH-based NIKE for a given security level, even with our improvements. However, subsequent analyses [3,4] indicate that CSIDH may not be as secure as originally estimated. Hence, we believe our improvements are still worth proposing, since they could lead to further improvements which might make SIDH competitive in this setting. In any case, accurate information about the cost overhead of SIDH-based NIKE is necessary for a fair comparison of current state of the art NIKE protocols under SIDH vs. CSIDH.
We are not aware of any other papers containing an extended discussion of NIKE protocols in the postquantum setting, though some protocols believed to be quantum-resistant have been analyzed in the classical setting [16, Theorem 1].

Extra Secrets from Automorphisms
In this section, we develop some mathematical preliminaries for changes we will make to the SIDH construction. These changes allow us, in certain situations, to agree on multiple non-isomorphic shared secret curves from a single public key pair. We believe these techniques are of independent interest, which is why we have isolated them in their own section.
We begin by recalling the SIDH construction. Consider now an elliptic curve E defined over a field of characteristic p not equal to 2 or 3. If η : E → E is an automorphism of E, that is, an invertible map of curves which is also a group homomorphism, then generically there are only two possibilities for η, as follows: either η(P) = P is the identity map, or η(P) = −P is the negation map. Two exceptional cases can occur when E is a curve isomorphic to E 0 : y 2 = x 3 + 1 or E 1728 : y 2 = x 3 + x, that is, when its j-invariant is equal to either 0 or 1728. In the first case, one can have a nontrivial automorphism of order six given by η 6 : (x, y) ↦ → (ζ 3 x, −y), where ζ 3 is a non-trivial third root of unity, and in the second case one can have a nontrivial automorphism of order four given by η 4 : The existence of these automorphisms has consequences for isogenies emanating from E. For instance, consider the case where η 4 : E 1728 → E 1728 is a non-trivial automorphism of order four. If G ⊂ E 1728 is a subgroup, then one obtains a second subgroup η 4 (G) of E 1728 which is usually distinct from G. (The cyclic subgroups of size N where it is not distinct correspond exactly to the ramification points of the classical modular curve X 0 (N) lying over j = 1728.) If ϕ G : . If we consider this setup in the context of the SIDH construction with E = E 1728 and A = G, then we have that Alice's public key is in a certain sense "degenerate," in the sense that there is an additional associated public key (E/η 4 ), but as an isogeny is not isomorphic to ϕ A . (For a detailed discussion of this unusual situation, in which two non-isomorphic isogenies have isomorphic domains and codomains, we refer to [1].) One may easily compute the associated torsion information for the other isogeny by precomposing ϕ A 4 . This means that each public key generated from j = 1728 actually corresponds to two public keys (with isomorphic curves but different torsion point information), and so a public key pair can be thought of (naïvely) as determining the four secret curves E/(A + B), E/(η 4 (A) + B), E/(A + η 4 (B)) and E/(η 4 (A) + η 4 (B)). However, these four curves comprising the four shared secrets generically¹ only represent two distinct isomorphism classes. This fact follows because the quotient maps E → E/(A + B) and E → E/(η 4 (A) + η 4 (B)) have kernels which differ by an application of η 4 , and so are isomorphic by the preceding reasoning (take G = A + B). The analogous fact is is true for the other pair. Nevertheless, despite this degeneracy, one still obtains two secret curves (up to isomorphism) from a single public key pair using E = E 1728 as the base curve. One can do even better by using η 6 : E 0 → E 0 , of order six. This time, each public key is thrice-degenerate, resulting in a total of nine shared secrets which represent three generically distinct isomorphism classes, namely: Since this case is the case of primary interest in what follows, we diagram it here. The subscripts on the initial arrows (leading out from the base curve) denote the kernel of the map, and the subscripts on the secondary arrows denote the isogeny obtained by quotienting out the second subscript after applying the isogeny determined by the first. The secondary arrows have multiple labels because the same isogeny arises in multiple ways, and the triple arrows have multiple labels because there are actually multiple isogenies.

The Action of Automorphisms on Private Keys
The observations in the previous section allow us to develop new strategies to limit the effectiveness of GPST and similar active attacks. To understand how these strategies work, we provide a description of the GPST attack using a morphism-based framework. The GPST attack works by modifying the values of ϕ B (P A ) and ϕ B (Q A ) presented to Alice, and such a modification can be viewed as giving Alice the information of where L is a linear automorphism of (E/B)[n A ] chosen by the attacker. When Alice computes her secret, she will then compute (E/B)/L(ϕ B (A)). The map L can be chosen so that the isomorphism class of (E/B)/L(ϕ B (A)) is always "close" to the isomorphism class of E/(A + B) (in the sense of being isogenous to E/(A + B) by an isogeny of degree ℓ A ), and by computing E/(A + B) and finding the location of (E/B)/L(ϕ B (A)) relative to E/(A + B), the attacker can find out information about A. Specifically, the attacker can exhaustively enumerate all of the ℓ A + 1 curves which are ℓ A -isogenous to E/(A + B), and try all of their j-invariants successively as the putative output of a shared secret computation with Alice. Depending on which of these guesses matches Alice's modified shared secret computation, the attacker then knows exactly which of the curves ℓ A -isogenous to E/(A + B) lies on the ℓ A -isogeny path of length e A between E/B and E/(A + B), and this partial information about the isogeny path corresponds directly to partial information about Alice's secret key.
We now suppose η 6 : E 0 → E 0 is a non-trivial automorphism of order six. The idea is that if the attacker gives false information for the map ϕ A , then this modification not only affects the computation of the secret E 0 /(A + B) but also that of the associated secrets E 0 /(A + η 6 (B)) and E 0 /(A + η 2 6 (B)). One can show that it is possible to choose private keys which guarantee that at least two (and typically three) of these computations will fail under GPST-type attacks. This line of defense increases the size of the attacker's search space, since the attacker now essentially has to guess the result of three modified shared secret computations simultaneously instead of just one. The increase in attack difficulty in turn yields an improvement in performance for a non-interactive exchange at the same security level. The same observation also leads to a natural non-interactive proof mechanism for validating SIDH public keys (cf. Section 5). Note that η 3 6 = −1 (as automorphisms), and η 2 6 = η 6 − 1. For any positive integer n, we will say that two points P, Q ∈ E[n] are independent if ⟨P⟩ ∩ ⟨Q⟩ = ⟨O E ⟩ (that is, the intersection of the subgroups they generate is trivial  Proof. Applying the previous lemma, it suffices to determine when pairs of elements in the set ]P, [ℓ e−1 ]η 6 (P), [ℓ e−1 ]η 2 6 (P)} are independent. Any pair of elements from this set is independent precisely when one element is not a scalar multiple of the other. In particular, if this property holds for one pair, then it holds for all of them by the linearity of η 6 . So it suffices to determine the probability that P is an eigenvector of η 6 . Since ℓ is not equal to the characteristic of the field of definition of E 0 , Deuring's lifting theorem [11, p. 203] implies that η 6 does not restrict to a scalar multiplication, so it has two distinct eigenvalues. Hence each one-dimensional eigenspace contains at most ℓ − 1 non-zero elements, so the probability of P not being an eigenvector of η 6 is at least 1 − 2(ℓ − 1)/(ℓ 2 − 1) = ℓ−1 ℓ+1 .

(with respect to a basis {P A , Q A } of E[n A ]). This alteration has the effect of changing Alice's shared secret computation if and only if a certain ℓ-torsion point lies in A.
We assume that Bob can interact with Alice to distinguish failed key exchanges from correct key exchanges. By repeating this process with different matrices, Bob can determine which ℓ-torsion points lie in A, and then iteratively do the same for ℓ 2 -torsion, The k-SIDH proposal [2] thwarts the GPST attack by having Alice and Bob instantiate α and β public keys respectively and performing αβ key exchanges. The main idea is that Alice's α different secret keys will not have any ℓ-torsion point in common. Therefore, any GPST-style alteration that Bob makes will cause at least one of the αβ key exchanges to fail, yielding no information about Alice's secret key. Indeed, even in the case α = β = 2, one can already arrange for Alice's two secret keys to be linearly disjoint, so that any alterations by Bob will cause one or more of the four shared secret computations to fail. However, k-SIDH with α = β = 2 is not enough to defend against a more sophisticated attack, in which Bob guesses which incorrect shared secrets Alice will compute, and then forges his own shared secret computation to match what he guesses Alice will compute. As shown in [2], the probability of a successful guess is 1/(ℓ(ℓ + 1)); briefly speaking, Bob must compute the correct ℓ e -isogeny, backtrack by one ℓ-isogeny (ℓ + 1 possibilities), and then move forward by one ℓ-isogeny (ℓ possibilities, since we exclude the one ℓ-isogeny that would undo the previous backtrack). Although SIDH is typically instantiated using ℓ = 2 or ℓ = 3 for efficiency, larger values of ℓ provide better defense against this type of attack. Our improvements below benefit even more from larger ℓ, and accordingly in what follows we propose the use of ℓ = 11 or ℓ = 13 as a good compromise between performance and security.
We now explain how to use multiple secrets to help k-SIDH better defend against the GPST attack. Suppose we use E 0 with j-invariant j = 0 for our base curve. For simplicity we assume E[n A ] has basis {P, η 6 (P)} and that Alice's secret key is of the form Q = P + η 6 (P) (we remark that most published implementations of SIDH, such as [7], use keys of this form). Each round of the key exchange then produces three secret keys. These keys are related: if the kernel of Alice's original secret isogeny is generated by Q = P + η 6 (P), then the other two kernels will be generated by η 6 (Q) = −P + ( + 1)η 6 (P) and η 2 6 (Q) = −( + 1)P + η 6 (P). Applying Lemma 3.2 to Q, we find that the elements {Q, η 6 (Q), η 2 6 (Q)} are pairwise independent with probability ℓ−1 ℓ+1 , and of course Alice could simply choose Q so that this property holds. Assuming it does, any GPST-style attack matrix will cause at least two of the resulting shared secret computations to be wrong, since a GPST matrix M is upper-triangular with one eigenvector, which can only overlap one of {Q, η 6 (Q), η 2 6 (Q)}; any element of this set which does not lie in an eigenspace of M will generate a kernel which is perturbed by M, resulting in an incorrect shared secret computation. Furthermore, with high probability (namely, ℓ+1−3 ℓ+1 = ℓ−2 ℓ+1 ), all three shared secret computations will be wrong; we find this probability by observing that {Q, η 6 (Q), η 2 6 (Q)} defines three lines in E[n A ] and that the eigenvector of the GPST matrix avoids all three with probability ℓ+1−3 ℓ+1 . This refinement therefore prevents the simple version of the GPST attack in which the adversary submits altered public keys and probes for correctness in the shared secret computation.
Consider now the "sophisticated" version of the GPST attack in which the adversary tries to guess which incorrect shared secrets Alice will compute. Under a naive estimate, typically three of the shared secrets will be wrong, and the number of possible wrong answers for each shared secret is ℓ(ℓ + 1). The attacker then has to search through a space of Ω((ℓ(ℓ + 1)) 3 ) possibilities. If Alice has α public keys, the cost is therefore Ω((ℓ(ℓ + 1)) 3α ) ≈ ℓ 6α , and so setting 256 ≈ lg(ℓ 3α (ℓ + 1) 3α ) (where 256 is required to resist Grover's algorithm, but 128 can be chosen for security against classical attacks), we get α ≈ 12 for the prime ℓ = 11.
Unfortunately, the naïve estimate above overestimates security. The reason is that the "incorrectness" of the three shared secrets is not independent: the errors are correlated, and the attacker can exploit this correlation. Specifically, an attacker can start from E 0 /A and compute all of the ℓ + 1 possible ℓ-isogenies starting from E 0 /A. Of these, exactly one ℓ-isogeny will have codomain equal to the correct curve, namely the elliptic curve lying along the ℓ e -isogeny path from E 0 to E 0 /A. The attacker does not know which curve is correct, but can guess the correct curve with probability 1/(ℓ + 1). Having guessed the correct curve E ′ , the attacker can now compute the images B 1 , B 2 , B 3 of B, η 6 (B), η 2 6 (B) in E ′ under the isogeny E 0 → E ′ , and then the three curves E ′ /B i , for i = 1, 2, 3. Each of these three curves now admits ℓ + 1 possible ℓ-isogenies, of which one will land in the correct curve E/⟨A, B⟩, and the others will correspond to possible incorrect secrets that Alice might compute. The probability of guessing all three incorrect secrets successfully is thus 1/(ℓ+1) 4 , or alternatively 1/(ℓ 3 · (ℓ + 1)) if we assume that none of the three is computed correctly by Alice. As far as we know, there is no better way to guess, although we can only prove optimality by introducing an additional assumption contrived exactly for this purpose. If we assume that there is no better way, then the actual cost of blindly searching for Alice's incorrect shared secret values is Ω(ℓ 3α (ℓ + 1) α ) ≈ ℓ 4α , which increases the requirements for α by a factor of 3/2. For 256-bit security and ℓ ≈ 11, we need α ≈ 18 in order to obtain 256 ≈ lg(ℓ 3α (ℓ + 1) α ).

NIZK-based SIDH key validation
A second approach to key validation is to have the two parties run an additional zero-knowledge proof protocol to validate the SIDH key. In this section we present a new isogeny-based zero-knowledge identification protocol which, unlike previous such protocols, validates all elements of an SIDH key. By itself, our protocol has non-negligible soundness error. Since we require negligible soundness error for key validation purposes, we must repeat this protocol many times. We refer to Section 6 for a discussion of efficiency considerations. One can apply a generic transformation such as the Fiat-Shamir [19] or Unruh transformation [18] in order to convert the resulting interactive protocol into a non-interactive transcript.
In the original De Feo-Jao-Plût zero-knowledge identification scheme [10], a prover publishes In cases where the proof is repeated many times, it may be possible for a verifier to detect the resulting bias in B and flag the prover as a likely cheater, but this technique is more complicated than a simple Σ-protocol, and we do not pursue it here. Instead, we propose to exploit the availability of multiple secrets from degenerate keys in order to validate ϕ A ⃒ ⃒ E[n B ] , using a modified Σ-protocol. Our new zero-knowledge proof proceeds as follows. We use the base curve E = E 0 with j-invariant 0. In the commitment phase, the prover publishes E/B and the three shared secrets E 1 = E/⟨η 6 (A), B⟩, E 2 = E/⟨η 2 6 (A), B⟩, and E 3 = E/⟨η 3 6 (A), B⟩. The verifier choose a challenge b ∈ {0, 1, 2, 3}. In the b = 0 case, the prover responds with B, and the verifier computes ψ i : E → E/η i 6 (B) and ψ ′ i : E/A → E/⟨A, η i 6 (B)⟩ for i = 1, 2, 3 as in the SIDH protocol, and verifies that the isogenies (ψ 3 respectively. The verifier also checks that {B, η 6 (B), η 2 6 (B)} are pairwise independent, so that the results of Section 4 apply. In the other cases, the prover responds with the kernel of the isogeny E/B → E b , and the verifier computes the isogeny using this kernel and verifies that its codomain matches the Correctness of our protocol is immediate. Zero-knowledge follows easily from the proof of [10, Theorem 6.3], as follows: If the simulator guesses b = 0, then it chooses B and produces the commitment data (E/B, E 1 , E 2 , E 3 ) from its knowledge of B, and responds as the honest prover would respond to the challenge b = 0. If the simulator guesses b = 1, 2, 3, then it chooses E/B and the isogenies E/B → E b randomly of degree ℓ e , and responds with the kernels of these isogenies to the challenge b = 1, 2, 3. These responses are indistinguishable from an honest prover under DSSP. Revealing these extra (codomains of) maps does not create any extra insecurity, since a simulator (who, in the b = 0 case, knows B) can (in the b = 0 case) generate all these maps on their own anyway.
To prove soundness, the proof of [10,Theorem 6.3] shows that E/A is a valid curve, so we only need to prove the correctness of the auxiliary data. Recall that the verifier checks in the b = 0 case that {B, η 6   probability 1/5, then the failure probability for a cheating prover is at least 2/5: either the b = 0 response is flawed, which the verifier will detect whenever the verifier chooses the b = 0 value (40%probability), or else at least two of the responses out of b ∈ {1, 2, 3} case are flawed, which the verifier will detect whenever the verifier chooses one of these two values (40%probability). One may try to optimize our zero-knowledge proof by having the prover publish the auxiliary data ϕ B for the commitment E/B and then using this auxiliary data to derive (say) all three of the kernels E/B → E i from one of them. However, this approach is insecure, since the kernel of E/B → E 3 is equal to ϕ B (A), and knowledge of both ϕ B ⃒ ⃒ E[n A ] and ϕ B (A) trivially exposes the original secret A. Another idea is to reveal more than one of the maps E/B → E b at once. While this strategy may work in practice, we cannot prove it to be zero-knowledge, since a simulator cannot accurately simulate two related maps simultaneously.

Eflciency
We compare the efficiency of our two methods, using the 256-bit classical / 128-bit quantum security level (which is the only security level treated in [2]). For our first method, using the primes ℓ A = 11 and ℓ B = 13, the results of Section 4 show that we need α = 18 and β = 17 respectively in order to implement our variant of the k-SIDH NIKE protocol with this security level. The public keys are 18 (respectively 17) times larger than in SIDH, and each party computes 3 · 18 · 17 = 918 shared secrets. As with "standard" SIDH using ℓ A = 2 and ℓ B = 3, there is no difficulty in finding primes p of the appropriate size. Costello and Hisil [9, Fig. 2] indicate that such primes are about 3 to 4 times slower than standard primes.
Our second NIKE proposal, using explicit key validation via zero-knowledge proofs, requires approximately 347 proof iterations for 256-bit security (since (3/5) 347 ≈ 2 −256 ). Relative to an SIDH iteration, each zero-knowledge proof iteration is also larger (since there is more commitment data) and slower (since multiple isogenies potentially need to be verified) by a small constant factor. Comparing our two methods, the public keys for the second method are larger, and the computational cost of the two is approximately the same at the 256-bit security level. Our second scheme scales better in computational cost with increasing security, since the computational cost grows only linearly in the security level instead of quadratically. However, our first scheme has smaller public keys, and validates both keys at once, whereas the second scheme needs to be repeated by each party in order to validate both keys.

Implementation
We implemented the automorphism-based multi-secret SIDH protocol described in this paper, using Doliskani's publicly available SIDH reference implementation [12] as a base. Our implementation uses p = 2 · 13 102 · 11 111 + 1 and E : y 2 = x 3 + (32 + √ −1). It can be found at [20]. Our implementation is intended as a proofof-concept to validate the correctness of the construction, and as an aid to non-specialists who may benefit more from working code than a detailed technical description.