Single-query Quantum Hidden Shift Attacks

. Quantum attacks using superposition queries are known to break many classically secure modes of operation. While these attacks do not necessarily threaten the security of the modes themselves, since they rely on a strong adversary model, they help us to draw limits on the provable security of these modes. Typically these attacks use the structure of the mode (stream cipher, MAC or authenticated encryption scheme) to embed a period-finding problem, which can be solved with a dedicated quantum algorithm. The hidden period can be recovered with a few superposition queries (e.g., O ( n ) for Simon’s algorithm), leading to state or key-recovery attacks. However, this strategy breaks down if the period changes at each query , e.g., if it depends on a nonce. In this paper, we focus on this case and give dedicated state-recovery attacks on the authenticated encryption schemes Rocca, Rocca-S, Tiaoxin-346 and AEGIS-128L. These attacks rely on a procedure to find a Boolean hidden shift with a single superposition query, which overcomes the change of nonce at each query. As they crucially depend on such queries, we stress that they do not break any security claim of the authors, and do not threaten the schemes if the adversary only makes classical queries.


Introduction
Since Shor's algorithm [Sho94], the enhanced computational power of quantum devices has been known to impact the security of public-key cryptosystems.Nowadays, post-quantum (public-key) cryptography is structured around several computational problems (e.g., lattice sieving, decoding random codes. . . ) which are believed to remain intractable.
The situation is more favorable in symmetric (secret-key) cryptography, since most of it is expected to remain secure.Generic attacks on primitives are now well understood, for example Grover's quantum search [Gro96] that accelerates the recovery of a secret key from a time O(2 κ ) to O 2 κ/2 , or the BHT algorithm [BHT98] which accelerates n-bit collision search from O 2 n/2 to O 2 n/3 .Many dedicated quantum attacks have also been introduced, whether on block ciphers [BNS19,KLLN16b] or hash functions [HS20].Most of the time, these attacks reach at most a quadratic speedup (like Grover's search).In this paper, we focus on superposition attacks on modes of operation, which are known to allow super-quadratic speedups or sometimes total breaks of classically-secure schemes.
Superposition Queries.The literature separates quantum attacks on symmetric schemes in two categories.In the Q1 setting, the adversary has only classical access to the attacked function, typically an encryption scheme or MAC which contains secret information (the key or internal states).Such attacks follow the main threat model of post-quantum cryptography, where the adversary records computations to decrypt them later in time.In the Q2 setting, also named superposition query model, the adversary can query the function as a quantum oracle, i.e., from within a quantum computation.Obviously, this cannot model a "store now, decrypt later" scenario anymore.Despite this lack of practical applications, Q2 attacks are still a relevant source of information on the quantum security of these schemes, as they are known to break many classically secure modes of operation [KM10,KM12,KLLN16a,LM17].On the one hand, they can be used as a starting point or motivation for improved Q1 attacks [BHN + 19, BSS22].On the other hand, they can be seen as impossibility results, showing that any security proof must consider an adversary making classical queries to the scheme [ABKM22].
Principle of Q2 Breaks.Consider a symmetric scheme E K : {0, 1} n → {0, 1} m with a secret key K, to which we have quantum access.
Typically, Q2 attacks will combine some pre-processing function f and post-processing function g so that the function g • E K • f has some property that can be exploited.For example, the Even-Mansour cipher: E k1,k2 : x → k 1 ⊕ Π(k 2 ⊕ x) , where Π is a public permutation, can be attacked by noticing that E k1,k2 ⊕ Π is a periodic function on F n 2 , of period k 2 [KM12].Simon's algorithm [Sim97] can recover this period in O(n) quantum queries.Other attacks (for example in [BLNS21], using a non-trivial f ) may target an internal state instead.In MACs and authenticated encryption (AE) schemes, this can lead to forgeries.
A typical limitation of Q2 attacks is when the construction E K admits a nonce N , like many MACs and AE schemes.It is indeed common [ATTU16] to assume that nonces remain classical values, and that they are not repeated from one Q2 query to another.While many attacks can also bypass the use of nonces [Bon17,BLNS21], they cannot apply in a situation where we would query: E K,N (x) = f (x ⊕ s(K, N )) where the secret internal state s depends on K and N .
New Strategy.In this paper, we use a hidden shift algorithm with a single query from [ORR13].It follows a well-known strategy in quantum computing which was previously applied in [vDHI06,Röt10] and requires, in our case, a combination with a state preparation technique [SLSB19].
We consider several AE schemes, where the recovery of the internal state leads to forgery or key-recovery attacks.Our strategy is to perform a superposition query with several message blocks which, with proper post-processing, can be turned into an oracle: where g is a function to {−1, 1}, and s and s ′ are values which, together, allow to determine a whole internal state.We measure s immediately, but we cannot use Simon's algorithm to obtain s ′ since it depends on the nonce, and will change at the next query.
Instead, we use the hidden shift algorithm from [ORR13].This algorithm performs a Hadamard transform: with g the Walsh-Hadamard transform of g.It then computes a multiplication by 1/ g(y) in the amplitudes of this state.Such a multiplication cannot succeed with probability 1.
In fact, the attack will require many trials, using each time a new random nonce, and even possibly a new secret key.When the multiplication succeeds, we obtain (−1) s ′ •y |y⟩ which, after another Hadamard transform, gives us s ′ .With s and s ′ , we solve a system of equations which gives us the full internal state of the scheme.
Table 1: New quantum attacks and comparison with generic attacks ("Grover")."Toffoli" is an approximate count of the total number of Toffoli gates applied during the attack, derived from the Toffoli count of AES.Approximately 10 3 to 10 4 qubits are required for all attacks, since the internal state of the schemes is of order 10 3 bits.The resulting attacks are summarized in Table 1.While we compare them with the gate and query counts of Grover search, one of their features is that the independent trials can be perfectly parallelized.It is well-known that reducing the depth of a Grover search by a factor S increases the computational cost by the same factor S. Therefore, under a limitation in depth, the advantage of our attacks becomes more significant.
Outline.We detail the targeted authenticated encryption schemes in Section 2. In Section 3, we give and analyze the quantum building blocks of linear post-processing, amplitude transduction and single-query hidden shift.In Section 4 we present our attacks.The SageMath [The24] and Python scripts that we used to write down formulas and compute the complexities in our applications are available at: gitlab.inria.fr/capsule/single-query-hidden-shift .

Description of the Schemes
In this section, we recall the Authenticated Encryption with Associated Data (AEAD) schemes AEGIS-128L [WP13b], Tiaoxin-346 [Nik16], Rocca [SLN + 21] and Rocca-S [NFI].Some details which are not relevant to our analysis will be omitted.In particular, we omit the processing of Associated Data and the padding of input messages.
The levels of security against key-recovery and forgery are set according to the generic attacks: • Key-recovery: using a single classical known-plaintext query, an adversary can always find the κ-bit key in O(2 κ ) computations (O 2 κ/2 in the quantum setting using Grover's algorithm [Gro96]); • Forgery: with a t-bit tag, an adversary that can make decryption queries can create a forgery in O(2 t ) queries classically.This attack can be accelerated quantumly if one has access to a quantum decryption oracle.This would cost O 2 t/2 quantum

AEGIS-128L
AEGIS was originally published at SAC [WP13a], and later submitted to the CAESAR competition [WP16].We will focus here on the variant AEGIS-128L, which can be found in [WP13b].In the CAESAR competition, AEGIS-128 appeared in the final portfolio for use case 2 (high-performance applications), and AEGIS-128L was a finalist for this use case.
All variants of AEGIS use a large internal state, made of several 128-bit registers, and a simple round function which updates this state and mixes it with additional registers of input (e.g., the message blocks).This round function is based on the block cipher standard AES [Nat01].
The AES Round.We denote the AES round function as: A = MC • SR • SB .It applies on a state of 128 bits, represented as a 4 × 4 matrix of bytes, where the bytes are numbered from 0 to 15, top to bottom and left to right.SB (SubBytes) applies the AES S-Box (denoted SBox) in parallel to all bytes.SR (ShiftRows) shifts row number i in the matrix by i positions left.MC (MixColumns) multiplies each column by the AES MDS matrix.
AEGIS-128L Algorithm.AEGIS-128L accepts a key and a nonce of 128 bits each.The internal state is made of eight 128-bit registers denoted S[i], 0 ≤ i ≤ 7. The round function R takes two additional 128-bit inputs X 0 , X 1 and outputs S ′ = R(S, X 0 , X 1 ) as: Without AD, the algorithm has the following phases: • Initialization: after loading the key K and nonce N into the state, we run 10 round updates R(S, N, K) • Encryption: each round of encryption takes two plaintext blocks M i , M ′ i and returns two ciphertext blocks where AND denotes the bit-wise Boolean AND.
• Finalization: the state update function is called 6 times with X 0 , X 1 depending on the AD length and message length.The authentication tag is obtained by XORing the 7 first registers.
Security.Third-party cryptanalysis has shown that AEGIS is insecure under nonce misuse [KEM17] and that it exhibits linear keystream biases [ENP19].However, these attacks did not contradict its security claims.To the best of our knowledge, there has been no quantum security analysis of AEGIS.

Tiaoxin-346
Tiaoxin-346 was submitted to the CAESAR competition [Nik16] where it reached the third round.It accepts 128-bit keys and 128-bit tags.The internal state T is made of thirteen 128-bit registers separated into substates T 3 , T 4 , T 6 with 3, 4 and 6 registers respectively denoted as T j [i].The round function R(T, X 0 , X 1 , X 2 ) takes a 3-register input X 0 , X 1 , X 2 and updates the state as shown on Figure 1.In particular, it can be noted that the round function processes independently the substates T j .
In the initialization phase, the key and nonce are loaded in T , then, 15 rounds of the round function R(T, Z 0 , Z 1 , Z 0 ) are applied where Z 0 and Z 1 are constants.In the encryption phase, message blocks are also encrypted by pairs It can be noted that the state update is performed before outputting the ciphertexts, and not after like the other designs in this section.Finally, the finalization performs 20 unkeyed rounds R(T, Z 1 , Z 0 , Z 1 ) and outputs the tag as the XOR of all registers T j [i].
Security.An important difference between Tiaoxin and AEGIS is that the round function of Tiaoxin is invertible, as well as the initialization phase.Thus, recovering the internal state at any point of the ciphering process leads to a key-recovery.Furthermore it is enough to recover a single substate T j .A few third-party works have studied the security: a key-recovery attack in a noncemisuse scenario has been proposed [KEM17], and Tiaoxin reduced to 8 rounds of initialization has been shown to have weak keys [LIMS21].To the best of our knowledge, there has been no quantum security analysis of Tiaoxin.

Rocca
Rocca is an AEAD for beyond-5G applications.As such, it also aims at quantum security and uses keys of 256 bits.The internal state S is made of eight 128-bit registers denoted S[i], 0 ≤ i ≤ 7. The round function R (Figure 2) takes two additional 128-bit inputs X 0 , X 1 and outputs S ′ = R(S, X 0 , X 1 ) defined as: Algorithm.The specification that we give here is from the latest version (2023-03-16) of the ePrint report [SLN + 22].After the publication of the conference version [SLN + 21] and subsequent third-party cryptanalysis [HII + 22], the authors added a key feedforward in the initialization phase to make it non-invertible, which was not present in the conference version.
The key is divided into two 128-bit key blocks K 0 , K 1 .The scheme also uses a pair of constants Z 0 , Z 1 .Rocca (without AD) runs as follows: • Initialization phase: the state S is initialized using the nonce and key.Then, 20 rounds R(S, Z 0 , Z 1 ) are applied.Then, K 0 , K 1 are XORed to S[0], S[4] respectively.
• Encryption: message blocks are encrypted by pairs For all i from 0 to m − 1: • Finalization: the state is updated 20 times using R(S, |AD|, |M |), where |AD| and |M | are the respective lengths of the AD and message, and the 128-bit tag is computed as the XOR of all state registers.Classical Security.The authors of Rocca claimed 128-bit security against forgery attacks and 256-bit security against key-recovery attacks.Importantly, they did not make any claims in the nonce-misuse setting.In [HII + 22], Hosoyamada et al. introduced a nonce-misuse attack that could recover the internal state using only one nonce-repeated pair.It follows a strategy of introducing a difference in certain message blocks, in order to observe some output differences, and solving the obtained equations to recover state values.Since the finalization function is key-less, recovering the state allows to create forgeries.
They then observed that one could turn this attack into a nonce-respecting one, by making decryption queries (which are authorized to repeat the nonces).After making a first nonce-respecting query to the encryption oracle, the adversary introduces a difference in the obtained ciphertext and tries to decrypt by trying all possible tags.If the number of decryption queries is not limited, this will eventually succeed after 2 128 such queries, leading to a recovery of the state.In the first version of Rocca, where the initialization phase was invertible, the state recovery led to a key-recovery attack, breaking the claims.However, with the modified initialization, a recovery of the state does not lead to a recovery of the key.
Quantum Security.The authors of Rocca made no claim against Q2 attacks.Anand and Isobe studied specifically the quantum security of Rocca [AI23] and found a forgery attack that requires 2 75 superposition queries.This attack is nonce-respecting and makes Q2 decryption queries.

Rocca-S
Rocca-S is a new version of Rocca which was proposed for standardization by the IETF [NFI].We refer to the version of the draft standard which is the latest one at the time of writing (published March 2 nd , 2023).
Round Function.The internal state of Rocca-S is made of 7 registers of 128 bits.The round function S ′ = R(S, X 0 , X 1 ) (Figure 3) updates this state as follows: Algorithm.The algorithm (without AD) runs as follows: • Initialization: after loading the key K 0 , K 1 and nonce, 16 rounds of R(S, Z 0 , Z 1 ) are applied, followed by a key addition in all state registers.
• Encryption: for all i = 0 to m − 1: • Finalization: the round function R(S, |AD|, |M |) is iterated 16 times.Then, the 256-bit tag is computed as: Security.The increased tag size allows the authors of Rocca-S to claim 256 bits of security against forgery, state and key-recovery attacks (nonce-respecting).In the quantum setting, they claim 128 bits of security against nonce-respecting forgery and key-recovery attacks.However, like Rocca, they did not consider attacks in the Q2 setting and did not make security claims in this model.To the best of our knowledge, Rocca-S remains secure in the Q1 setting.

Tools
In this section we give the main algorithmic tools of our attacks.These tools are gathered from previous works in quantum cryptanalysis [BBC + 21] and quantum computing [ORR13] and adapted here to our setting.To the best of our knowledge, the case of "smaller correlation" (Theorem 2) is new.
We assume basic knowledge of the quantum circuit model [NC02] (Toffoli / CNOT / Hadamard gates, ket |•⟩ notations).As is commonly done in previous works [Bon17, CHLS20, KLLN16a], we query AE schemes using a standard oracle.However, in our main quantum algorithm, we need a phase oracle.

Definition 1 (Standard oracle). For
Both oracles are equivalent by composing with a Hadamard transform.Also, if one knows a classical circuit that implements f , both oracles are easy to construct.
These AE schemes are nonce-based.While the nonce can be chosen by the adversary, it cannot be repeated between two queries.Since Q2 queries are merely an extension of classical queries, the same can be said in the quantum setting.Therefore, we impose that each of the Q2 queries is answered using a different, classical nonce.Using a classical nonce or randomness is common in proofs of quantum security for encryption and AE modes [ATTU16, BBC + 21].
That is, the adversary has access to a family of oracles: O N,m for different nonces N and message lengths m (we assume that the AD is empty), and they cannot make two queries with the same nonce.
Each oracle encrypts several (pairs of) message blocks (M i , M ′ i ), depending on the selected length, and returns the corresponding (pairs of) ciphertexts (C i , C ′ i ), and the tag: Quantum Search.Grover's exhaustive search algorithm [Gro96] is a procedure to find a "good" element in a search space of size 2 n in π 4 2 n/2 iterates; each iterate queries a phase oracle that flips only the phase of this good element.Amplitude amplification [BHMT02] generalizes this to any algorithm A (even a quantum algorithm) that outputs a good element with probability p.It then makes about π 4 1 √ p iterates, with two calls to A and one query to the oracle per iterate, to succeed with overwhelming probability.
Grover Search Cost Estimates.All AE schemes studied in this paper are based on the AES round function.Quantum attacks on them require to implement AES components.Since the scope of this paper is only to demonstrate the existence of attacks, we will use approximate quantum gate and query counts (by a factor 2 at best).For example, Table 4 in [JBS + 22] gives a count 12240 = 2 13.58 Toffoli gates for a full (10-round) AES-128.We use this to assume that a single round of AES can be implemented with 2 10 Toffoli gates (we focus only on Toffoli counts for simplicity).
For all four schemes, implementing Grover's exhaustive key search requires to recompute the initialization of the scheme.The number of iterates depends on the key size (128 or 256 bits) and the cost of the Grover iterate is dominated by this initialization function which, in turn, can be estimated using the number of AES rounds it contains.These estimates are summarized in Table 3.
At some point in our algorithms, we also need to solve AES S-Box differential equations of the form: S(x ⊕ ∆) ⊕ S(x) = ∆ ′ .This can be done using a small Grover search on x, costing 2 6 S-Boxes at most, i.e., 4 rounds of AES, or 2 12 Toffoli gates.Toffoli Counts of Arithmetic Operations.Since the complexities of our attacks will be clearly below those of exhaustive search, we give only imprecise upper bounds on the cost of quantum circuits for arithmetic operations.Using the addition circuit of [CDKM04], an n-bit addition costs 2n Toffoli gates, and a controlled variant can be implemented with 4n Toffoli gates.Using a simple implementation as a series of controlled additions, an n-bit product can be implemented with 4n 2 Toffoli gates.A table lookup circuit, implementing |i⟩ |0⟩ → |i⟩ |c i ⟩ where the c i are classically stored values, takes 4 × 2 n × m Toffoli and CNOT gates when c i is on m bits and i is on n bits.Finally, a Euclidean division of an n-bit integer by an m-bit integer costs about 4nm Toffoli gates using a sequence of n conditional subtractions.

Linear Post-processing
The generic approach to post-process the output of an oracle requires two identical calls, due to the reversibility of quantum computations.This is not doable in our case since we only query the oracle once.Fortunately, truncations [HS18] and more generally linear functions [BBC + 21] can be computed from a single call.
For our purposes, we need to separate the linear function in two parts, one of which goes directly into the phase.This can be obtained using [BBC + 21, Lemma 2] as a black-box, but we give the whole proof (adapted directly from [BBC + 21]) to be self-contained.
Proof.On input |x⟩ |y⟩, create the uniform superposition over outputs z and append a qubit in the state Compute O h with register z as input and the last qubit as output; compute O g with register z as input and y as output: Notice that the result of h appears in the phase now, because we used H |1⟩ as its output register.Now, apply O f with register x as input and z as output: Redo the computations of O h and O g : Erase the qubit (H |1⟩) and use the linearity of g, h to rewrite: The last register becomes disentangled and always contains a uniform superposition over {0, 1} m , which we can erase, leading to the result.
In particular, we can truncate the output of a stream cipher and separate it in two parts, one that remains in the computational basis state, and one that goes into the phase.

Properties of the Walsh Transform
Let f : {0, 1} n → {−1, 1} be a function.The Walsh-Hadamard transform of f is defined as: f (y) = x∈{0,1} n (−1) x•y f (x).It corresponds to the Fourier transform in the group 2 .The quantum Hadamard transform H ⊗n computes a (normalized) Walsh-Hadamard transform on the amplitudes of its n-qubit input state.That is: In the following, we need the following important properties of the Walsh transform.

Amplitude Transduction
Quantum rejection sampling is the process of transforming a quantum state into another one, by modifying its amplitudes -in a way similar to classical rejection sampling which transforms probability distributions.Suppose that we have a quantum state of the form: x u x |x⟩ |α x ⟩, where 0 ≤ α x < 2 n is an integer.(Therefore |α x ⟩ is indeed a basis state).We want to transform this into a state x u x αx 2 n |x⟩ |α x ⟩, i.e., move α x into the amplitude (up to renormalization).A typical way to do this is to append a qubit register starting in state |0⟩, which is transformed into a superposition of the form: αx 2 n |0⟩ + |ψ αx ⟩ where |ψ⟩ is a superposition of non-zero basis states.This step is called amplitude transduction.Then, the state becomes: Measuring |0⟩ in the last register collapses the state on the wanted superposition.
We use the amplitude transduction algorithm of [SLSB19].
Proof.The algorithm runs as follows.First, we apply a Hadamard transform on n qubits: We perform a comparison between y and α, which costs O(n) gates, and write the result in the last qubit: We apply a Hadamard transform on the register holding y, obtaining: where we recover a state with the form claimed.The exact form of |ψ α ⟩ depends only on the value of α and is not relevant for the rest of our study.
Approximation.In the context of this paper, the value αx 2 n will be a fixed-point approximation of the amplitude that we actually want.Since the approximation error will be at most 2 −n , if n is large enough, the resulting quantum state will be close to our target state, and the algorithm will run without failure.
In particular, let α ′ x be the "exact" amplitude and assume that our approximation satisfies: Let |ψ i ⟩ be the "ideal" state after transduction success and |ψ r ⟩ the "real" one, respectively: The Euclidean distance between them can be bounded as follows: Let p = x |α ′ x u x | 2 be the "ideal" probability to succeed in the transduction.We have: Consider an algorithm (e.g., Algorithm 1, that we will define later) that uses transduction once, succeeds here with probability p, does further operations, measures and succeeds with probability p ′ r (resp.p ′ i ).By Lemma 3.6 in [BV97], the total variation distance between the two probability distributions resulting from the "ideal" and "real" states is at most 4∥ |ψ i ⟩ − |ψ r ⟩ ∥.Consequently: If we ensure p ≫ 2 −n , it is enough to study the approximated version.This will be the case in the attacks studied in this paper, as we typically use more than 300 bits of precision to approximate the amplitudes, while the success probability is bigger than 2 −50 .More generally, while increasing this precision may require more costly arithmetic circuits, we haven't encountered a case where this limits the attacks.

Quantum Hidden Shift Algorithm with a Single Query
We want to solve the following problem.
Problem 1 (Hidden shift).Let g : {0, 1} n → {−1, 1} be a function that can be computed in polynomial time.Given access to a quantum oracle for f : x → g(x ⊕ s), where s is a secret value, find s.
The algorithm that we present here (Algorithm 1) is from [ORR13], and uses quantum rejection sampling.Several special cases have appeared before in cryptanalysis: for example, shifted multiplicative characters [vDHI06] and bent functions [Röt10].In both cases, the algorithm avoids the rejection sampling by considering a situation in which the Fourier transform of the shifted function is easy to compute: in the former case, it's a multiplicative character, and in the latter, a constant.
In our case, we are interested in the probability to succeed after making a single phase query to the function f .
Ideas of Algorithm 1.The first step is to query f and to perform a Hadamard transform.This places the Walsh coefficients of f into the amplitudes of the state.Next, we remark that by Proposition 1, these coefficients are actually those of g, multiplied by (−1) x•s .If we had the state 1 2 n/2 x (−1) x•s |x⟩, we could immediately do a Hadamard transform and obtain |s⟩.The Walsh coefficients of g prevent us to do that.
Thus, the next step is to correct the amplitudes by multiplying them by 1/ g(x), using amplitude transduction (Subsection 3.3).Ideally, we would obtain the wanted state (−1) x•s |x⟩.However, the product operation is not possible if g(x) = 0. Furthermore, if the smallest values of g(x) are very small compared to the average, the probability to measure 0 (and succeed) gets smaller.Thus, the best strategy, as suggested in [ORR13], is to dismiss the small Walsh coefficients of g.We introduce a bound M in the algorithm and only multiply by Proof.Following Algorithm 1 until Step 5, we obtain the state: Here | is, by definition, an integer between 0 and 2 n .We first compute it, then compare the result with M , and perform a Euclidean division of 2 n M (a constant) by | g(x)|.This costs O n 2 gates.
Since we have computed g(x), we know its sign, and we can handle it immediately.We perform a controlled phase flip by sgn( g(x)), which will cancel the sign of g(x) in the phase, obtaining the state: The next steps realize amplitude transduction following Lemma 2. In the n + 1-qubit ancillary register, the amplitude on 0 n+1 is equal to αx 2 n , for all x.This includes the cases where α x = 0, where there is simply no amplitude on 0 n+1 .Therefore, the probability to measure 0 n+1 at Step 10 is equal to: x While the first term is equal to GM 2 2 2n = p, the second can be shown to be negligible.First, we have | g(x)| ≤ 2 n by definition, so the sum can be bounded as: Furthermore, we know that G ≤ 2 n and: which bounds the second term by 2 −n/2+1 .Assuming that we succeeded at Step 11 (i.e., we measure 0 n+1 ), the state collapses and becomes proportional to x (−1) Following the discussion in Subsection 3.3, the state is close to: as long as p ≫ 2 −n (which is the case here since p > 2 −n/2 ).We then apply H: Afterwards, the probability to measure y = s is: All in all, the total probability to succeed is: Remark 1.The condition p ≫ 2 −n/2 might appear as a strong limitation of Theorem 1.However, the values of n encountered in this paper range from 384 to 640, since we are recovering large hidden shifts, while the mere condition of having a valid attack imposes us p > 2 −128 .
In order to use Theorem 1, we need an efficient algorithm to compute g.In order to maximize the success probability, we need to know the distribution of the Walsh coefficients to choose M appropriately.Both will be possible in the cases we are interested in, because g will be the product of many small-range independent functions.Then g is easy to compute by taking the product of Walsh coefficients (see Proposition 1).
Remark 2 (Global phase).If we have access to ±g(x ⊕ s), where the leading sign is not known, it turns into a global phase that is irrelevant for the algorithm.At the final step, we will still measure s.
Algorithm 1 Quantum hidden shift with rejection sampling and the technique of [SLSB19].
Input: Quantum access to f (x) = g(x ⊕ s) for a known g, a bound M Output: s, with probability pp ′ 1: Start from n qubits initialized to 0 x (−1) x•s g(x) |x⟩ 5: Compute the amplitude multiplier: where 0 ≤ α x ≤ 1 in an additional register 8: Perform a comparison between y and α x and store the result in a new ancilla qubit 9: Apply H ⊗n on the register holding y: the amplitude on the |0 n+1 ⟩ component is αx 2 n 10: Erase |α x ⟩ 11: Measure the last register.If the obtained value is different from 0 n+1 , abort 12: Otherwise, the state has collapsed and is close to: 13: Apply H ⊗n , measure and return the result.
Remark 3 (Self-correlation).A technique similar to this algorithm appeared also in [Sch23], where instead of dividing by the Walsh coefficient, one multiplies by it.This would compute the discrete convolution: (f * g)(y) = x f (x ⊕ y)g(x) in the amplitudes of the state, and lead to a similar result since (f * g)(y) is greater for y = s.However the analysis when cutting off the small Walsh coefficients is more difficult, so we settled for the easier method.
Related Quantum algorithms.Problem 1 is very similar to Simon's problem [Sim97].Indeed, it would be possible to solve it with Simon's algorithm, which is now a fairly standard approach in quantum cryptanalysis.The main issue is that Simon's algorithm requires O(n) queries to the function, while we can only afford one query to it.Hidden shift attacks can also refer to the approach of [BN18].The problem there is slightly different, as the shift is with a modular addition and not an XOR.Moreover, as with Simon's approach, multiple queries must be performed.

Hidden Shift with Smaller Correlation
For the attack on AEGIS-128L (Subsection 4.5), we need to solve a more difficult variant of Problem 1, in which the function that we query is multiplied by a highly biased function h, which is unknown.We model this function as selected uniformly at random among Boolean functions of the same Hamming weight.
Problem 2 (Correlated hidden shift).Let g : {0, 1} n → {−1, 1} be a function that can be computed in polynomial time.Let h : {0, 1} n → {−1, 1} be a function selected uniformly at random in Given access to a quantum oracle for f : x → h(x)g(x ⊕ s), where s is a secret value; find s.
It can be noticed that Problem 1 corresponds to the case c = 1.When c is 0, we cannot hope to recover the secret s as the function queried will be completely random.However, if c is large enough, we can still use Algorithm 1.

Theorem 2. Consider the setting of Problem 2. On average over h, applying Algorithm 1 on f with g as the known function will recover s with probability greater than pc
is the probability to measure 0 at Step 11 and p ′ = G 2 n is the probability to succeed in the second step.
Proof.By similar bounds as in the proof of Theorem 1, in the following we can assume that the quantum rejection sampling works exactly, by subtracting a term 2 −n/2 in the probability of success.
Following Algorithm 1, the state after Step 10 is: If we postpone the measurement of Step 11 at the end of the algorithm, we have the state: We will now estimate the amplitude of |s⟩ |0⟩.We start by rewriting f (x) using the convolution theorem: Note that if h is constant and equal to 1, it has a single nonzero Walsh coefficient in 0 (z = x), equal to 2 n , and we recover the equality f (x) = (−1) x•s g(x) (and the rest of the proof of Theorem 1).The amplitude is We separate the term z = x from the rest, noticing that h(0) = x h(x) = 2 n c by our definition of c: where Now, we can use the fact that the terms that depend on h in this amplitude (and the Walsh-Hadamard transform) are linear in h, meaning that the average over h of this amplitude is the amplitude for the average function h * (x) = 1 |Hc| h∈Hc h(x).as H c is a symmetric distribution over the input values, h * will be a constant function.This means that for all x ̸ = 0, h * (x) = 0. Thus, the average amplitude is simply the isolated part, M Gc 2 3n/2 .Note that we need to estimate the probability, that is, the average of the square of the amplitude.We use the well-known fact that this is always greater than the square of the average (the gap between the two being the variance), and obtain that the probablity to measure |s⟩ |0⟩ is, on average over h, greater than (M Gc) 2 2 3n .For the probability to measure 0 at Step 11, it has the expression: We can use the same argument: the average over h is bigger than GM 2 c 2 2 2n .Finally, taking into account the failure probability of rejection sampling, we obtain the desired probabilities.

Applications
Our attack combines Algorithm 1 with linear post-processing to recover the internal state.Recall that the nonce and key are fixed classical values, which means that after initialization, in all targeted designs, the internal state is a fixed value.We want to recover it (or part of it).

State-recovery on Rocca: Hidden Shifts
Assume that we encrypt a couple of fixed message blocks (e.g., 0), then the internal state S remains a fixed value.Our goal is to recover this S.We encrypt 5 pairs of message blocks in superposition and unroll several of the corresponding ciphertexts.Some ciphertexts are linear in the message, and thus directly give a constant that depend on the initial state (denoted by E i ).The important part of these ciphertexts are places that contain both a linear combination of the input messages (denoted by X i ) and a function of some state values (denoted by V i ).
As M 2 is a free variable (it is not involved in any X i ), we can ensure that M 0 ⊕ M 2 = 0, so that V 4 only depends on initial state variables.
From these formulas, we can see that accessing C 0 , C ′ 0 , C 1 directly gives E 0 , E 1 and E 2 , while the other ciphertexts are, up to constants and a plaintext block, sums of functions of the form A(X i ⊕ V i ).
We now describe the quantum oracle we will construct from the query oracle.The inputs will be the 128-bit variables X 0 , . . ., X 4 and the message blocks for the query depend on them as follows (the others are simply put to 0): From these equations, it is easy to see that we have which is what we need.
Because the message blocks are either constant, or linear functions of the X i variables, we can add Next, we use a linear post-processing (Lemma 1) in order to construct the following oracle: where F is a linear function.
Remark 4. The E i are values that are also available in classical attack scenarios.The quantum advantage comes from the ability to retrieve the V i to recover the state.Hidden Shift Problem.Now we define the function F .Recall that A is a single AES round, of the form: A = MC • SR • SB.In order to transform the output into a single bit, we will take the dot-product with an appropriate 128-bit mask; we construct this mask with 16 copies of a single 8-bit mask β.
From now on, we choose an arbitrary β.While there is no particular constraint in the case of Rocca, the choice of the mask is more important in the cases of Tiaoxin and AEGIS (it will also be different for them).
On input a 128-bit AES state Z = (z 0 , . . ., z 15 ), we define the function: In other words, it removes the last MC layer, uses a linear mask on each byte and XORs them all.By definition: since L is invariant by permutation of the bytes.Next, we define the functions g and f : g(X 0 , . . ., X 4 ) := (−1) i<5 L(SB(Xi)) f (X 0 , . . ., X 4 ) := (−1) where f has a leading unknown bit depending on the constant terms.
We will now retrieve the hidden shift V 0 , . . ., V 4 using Algorithm 1.We rename the individual bytes of X 0 , . . ., X 4 as x 0 , . . ., x 79 and rewrite g as: In particular, f is still a shifted version of this function.Now, to bound the runtime and success probability of Algorithm 1, we need to analyze the Walsh coefficients of g.

Analysis of g.
Since g is the product of 80 individual functions of one byte: , we can use Proposition 1 and compute g as a product of g β .By definition, g β (x) is, up to a constant, the coefficient at column β and row x in the Linear Approximation Table of the SBox.Thus, g β corresponds to one column of the LAT.Moreover, we are interested only in the distribution of Walsh coefficients, and for the AES SBox, all non-zero columns are equivalent.Thus, any non-zero mask β gives the same result.The distribution is given in Table 4.
It could be a priori difficult to compute the Walsh spectrum of g, since it has a 640-bit input.However, by representing the distribution of Walsh coefficients as a table like in Table 4, we can compute the exact distribution, which is actually quite sparse.For 80 S-Boxes, the table contains approximately 7.5 million non-zero coefficients.
To run Algorithm 1, we need to select the threshold M maximizing the success probability.Recall that it is the product pp ′ , where 2 2n G is the success in the first step (which we can detect) and p ′ = G 2 n is the success in the second step.Since we know the entire Walsh spectrum of g, we select M to maximize pp ′ = M 2 G 2 2 −3n : M := 2 326.23 , G = 2 610.60 , p = 2 −16.94 , p ′ = 2 −29.40 , pp ′ = 2 −46.34 .These parameters will minimize the query complexity of the attack, however they might not be the best if we want to minimize the time complexity, as we will see below.
Quantum Arithmetic.Finally, we must design a quantum circuit that computes g(x), compares | g(x)| with M and computes ⌊2 n M/| g(x)|⌋.First, we notice that we can compute | g(x)| exactly.The computation of each g β requires a circuit with approximately 2 8 × 8 × 4 Toffoli and CNOT gates that, for each i, compares its input with i, and writes the corresponding output value.We do this 80 times in parallel.Overall, this costs ≤ 2 20 Toffoli gates.Afterwards, we take the product of all coefficients, two by two: the bit-length of the numbers that we multiply increases at each product.This step costs ≤ 2 17 Toffoli gates.
In the end | g(x)| is an integer between 0 and 2 80×5 = 2 400 .The comparison with M is done with about 2 × 400 Toffoli gates (see Section 3).The computation of ⌊2 n M/| g(x)|⌋, which is required for amplitude transduction, is a Euclidean division of a 980-bit constant number by a 400-bit one, which is done with about 4 × 980 × 400 = 2 20.6 Toffoli gates.In total, the overhead in Algorithm 1 with respect to the query of f can be upper bounded by 2 22 Toffoli gates (we did not count the additional CNOT gates, but their numbers are of the same order).

State-recovery on Rocca: Recovering the State
When Algorithm 1 succeeds in both steps (rejection sampling and final measurement), we obtain the values for all hidden shifts V 0 , . . ., V 4 , byte by byte, which we combine with the E i that we can directly measure.We have the knowledge of:

Preliminaries on AES-like Equation Systems.
It is important to recall here that A = MC • SR • SB is a keyless AES round, where the SR operation shifts the bytes in row i by i positions left.This is represented in Figure 4.In particular, if we know a diagonal of S, then we can deduce a column of A(S) (and the converse).However, knowing a column of S only allows to deduce an antidiagonal of SR • SB(S).Solving the System: Step 1. First, we obtain directly S[3] and S [7].We then consider the smaller system: Focusing on (E2) to (E5), we deduce the following, where * are known values: In particular, the third inequality is obtained by replacing S[1] ⊕ S[6] in (E5) by A(S[2]) ⊕ S[0] ⊕ * from (E4).This implies: • Otherwise, we have found 2 16 possibilities for S[0].We use the first equation to compute S[4] and we check that all equations are satisfied.
Though we need to examine 2 16 solutions for S[0], this will be done only 2 96−16 times, so overall the time to solve the sub-system is 2 96 .For each guess there are a few AES rounds to compute and 16 S-Box differential equations to solve.
We solve this remaining sub-system as follows.We guess two diagonals of S[5] (i.e., two columns of A(S[5])) and two diagonals of A(S[5]), for a total of 12 bytes, which are represented in Figure 6.
Next, we solve a linear system in Y := SR • SB(S[1]).Indeed, by the first line in Equation 19, we have two diagonals of A(S[1]) = MC(Y ), e.g., the bytes (0, 5, 10, 15, 4, 9, 14, 3) if we follow the pattern of the figure.By the second line, we have two columns of S[1], e.g., the bytes (0, 1, 2, 3, 4, 5, 6, 7), which give the bytes (0, 7, 10, 13, 1, 4, 11, 14) of Y .It appears that for each column of Y , we know two bytes before and after the MC operation.Though the positions of these bytes differ for each column, thanks to the MDS property of the MC matrix, we can always express the two unknown bytes of Y as a linear combination of the four known ones.
Having obtained Y , we deduce S[1], and check if both equations are satisfied.The time complexity of this step is therefore slightly smaller than the first one, since it does not require to solve S-Box differential equations.

A S[5]
⊕ * A ⊕ * Summary: Hybrid Attack.So far we are using a classical algorithm for the state-recovery part.With the selection of M that minimizes the number of superposition queries, the adversary queries its oracle for Rocca a total of 2 46.34 times on average.After each query, they perform the amplitude product, costing 2 22 Toffoli gates, and succeed in the first step 2 29.40 times on average.For each of these successes, they retrieve a candidate value for the hidden shift and solve the equation system.Once the system is solved, the candidate internal state can be tested by computing backwards a few rounds and checking the ciphertexts.
Quantum Attack.We can accelerate the attack by using quantum search to speedup the system solving.To solve the first subsystem in S[0], S[2], S[4], we proceed as follows: we create a quantum algorithm that samples a valid S[0], i.e., a value of S[0] that passes the S-Box differential equation, then tests if one of the 2 16 possibilities solves the entire system.As seen in Section 3, the S-Box differential equation can be solved in 2 12 Toffoli gates, and we have 16 of them to solve.Computing the remaining AES rounds costs less than 2 16 .
Then this algorithm is a sequence of two Grover searches with Toffoli count: On the output of this algorithm, we use amplitude amplification [BHMT02].By design, the probability that one of the possibilities for S[0] solves the system is 2 16−96 , so there are around π 4 2 (96−16)/2 iterates to make, and the total time is: At this point, our quantum attack requires 2 46.34 superposition queries and 2 67+29.4= 2 96.4 Toffoli gates.We can optimize this by noticing how the (average) Toffoli count depends on the probabilities p and p ′ to succeed in both steps of Algorithm 1: Since we know the entire distribution of the Walsh coefficients, we can solve this minimization problem on M and G, and we adopt: which gives a complexity of 1/(pp ′ ) = 2 59.11 Q2 encryption queries and 2 14.11 2 22+45 + 2 67 ≃ 2 81 Toffoli gates.If we count that Q2 queries should have at least the same Toffoli cost as quantum implementations of Rocca, the gate count is comparable to the generic forgery attack in 2 64 Q2 queries, though we do not require decryption queries anymore.

State-recovery on Rocca-S
The attack on Rocca-S is very similar to the one on Rocca.We have the same strategy: combine Algorithm 1 with linear post-processing to recover enough information on the internal state with a single query, and obtain this internal state by solving a simple system of equations.
Starting from an internal state S, we encrypt several message blocks and focus on the following outputs: Similarly as before, we set 3 input variables X 0 , X 1 , X 2 such that: and the other plaintext blocks are fixed to 0. We also define the three corresponding hidden shifts: 1 becomes constant, so we can immediately retrieve 4 values depending on the state S: which inverts MixColumns, multiplies each S-Box by an arbitrary mask β, and XORs the results.The situation is similar to Rocca except that we only combine 3 × 16 = 48 S-Boxes instead of 80.This reduces somewhat the gate count overhead for arithmetic operations, which we can still upper bound at 2 22 Toffoli gates.More importantly, it modifies the values of M, G, p and p ′ .

Key-recovery on Tiaoxin
Our method allows to recover the state T 3 at some point of the encryption phase.Afterwards, we can invert the round function on T 3 , and the initialization phase, and recover the key which was loaded in the initial state.
Let us fix the state T 3 [0, 1, 2] at the beginning of the encryption phase and unroll a few ciphertexts: )) We set the following as variables: The rest is fixed.We focus on C ′ 1 and C ′ 3 , and define the shift values: We then observe the XOR of C ′ 1 and C ′ 3 .More precisely, let L be the function that selects one bit in each column of the state and XORs them.We assume that on these 4 bits, T 6 [1] = 1.
Remark 5. Notice that the outputs of the S-Boxes are processed with different masks than before.While the choice of mask was inconsequential for Rocca and Rocca-S, here it becomes quite important, as we have to ensure that T 6 [1] = 1 at each bit position selected by the mask.A similar constraint occurs for AEGIS-128L in the next section.
Then we have: and we define the function: where b is an unknown bit depending on T .We have L • A = (L • MC • SR) • SB, so column by column, we have a one-bit function of the S-Box outputs, which can be rewritten as: (x 0 , x 1 , x 2 , x 3 ) → i α i • SB(x i ) for well-chosen masks α 0 , . . ., α 3 .
The situation is thus the same as before, since the distribution of Walsh coefficients is independent of the mask α i (as long as it's nonzero).Recovering the entire state T 3 [0, 1, 2] from the shifts V 0 , V 1 , V 2 is trivial and costs only a few AES rounds.Afterwards, we compute backwards through the 15 rounds of initialization on the state T 3 (30 AES rounds), and obtain a candidate key K that we can immediately check.All of this can be done classically.
Since there are 16 × 3 = 48 S-Boxes in the function, we optimize the probability similarly to Rocca-S and obtain pp ′ ≃ 2 −30 .The Toffoli cost of the entire attack is roughly 2 30 × 2 22 = 2 52 and it contains 2 30 Q2 queries.Note that we need to multiply these numbers by 2 4 , since we assumed to have guessed correctly 4 bits of T 6 [1].

State-recovery on AEGIS-128L
Contrary to the rest of this section, the attack on AEGIS-128L uses a function of smaller correlation: we use Theorem 2.
Starting from an initial state S, we encrypt pairs of message blocks (M i , M ′ i ) with M ′ i = 0. To simplify the notations, we will express the ciphertext blocks in function of T , the state after one update, that is, ).Our aim is to recover T .As they cannot be expressed from T , we ignore the first pair of ciphertext blocks and focus on: where h ′ and h ′′ are two functions whose exact expression is irrelevant here, and Y is a constant (an expression in which only T and M ′ i intervene).Similarly to Tiaoxin, we use a linear post-processing which truncates C 6 to only 4 bits.Therefore, though it is completely unknown (and depends on the unknown state T ), the AND term will become a function h with correlation 2 −4 .Heuristically, we model this function (that will change for each query) as a random one, and we use Theorem 2.
By making M 0 to M 4 vary, we obtain 5 shifts which give us: State-recovery Attack.After performing this partial state-recovery, we can do a Grover search on the remaining 64 bits of the state.However, checking if we have obtained a valid internal state is not trivial.Indeed, the round function of AEGIS is not invertible, so we cannot compute backwards and check with previous ciphertexts.In fact, there do not seem to be other ciphertext equations that we can exploit (either we have already used them, or they depend on the varying M i ).
Consequently, we do a Grover search using superposition decryption queries to test our guess of the state.That is, starting from the recovered internal state, we compute the tag (approximately 6 × 8 AES rounds, i.e., ≤ 2 16 Toffoli gates) and we try to decipher with an oracle.If the internal state is guessed correctly, the oracle will accept.This operation requires approximately: π 2 2 32 × 2 16 ≤ 2 49 gates and 2 33 decryption queries.Since the number of S-Boxes is the same as in Rocca, the hidden shift algorithm is actually the same.We keep the same p and p ′ , but introduce the correlation c = 2 −4 .The Toffoli count and the number of queries are respectively: ≤ 2 62 decryption queries, which is also smaller than a forgery attack using a Grover search.We also use 2 pp ′ c 2 ≤ 2 56 encryption queries.

Discussion
In all instances of our attack, the AE scheme (Rocca, Rocca-S, Tiaoxin, AEGIS) is believed to be secure regarding guess-and-determine attacks that aim at recovering the state.Indeed, when one only observes the ciphertext blocks, the obtained system of equations is intractable.
Our quantum attack works because we can observe hidden shifts in addition to the ciphertexts.This allows us to reduce the state-recovery to a simpler system of equations (the simplest being Tiaoxin-346 which only relies on three shifts).However, there are limitations to this approach.Notably, if we have a ciphertext C = S 0 ⊕ A(S 1 ⊕ M 1 ), we have only two choices: either make M 1 = 0 a constant, and observe S 0 ⊕ A(S 1 ), or make M 1 a variable, and observe S 1 .In the latter case, S 0 is lost.Besides, we can only use one variable for one shift, i.e., if we have C = A(S 1 ⊕ M 1 ) and C ′ = A(S 2 ⊕ M 1 ), we must drop one of the ciphertext blocks.Another problematic case is when we observe A(S 0 ⊕ A(S 1 ⊕ M 1 )).Though we do have a shifted function, the function is now unknown (it depends on S 0 ) and more complex (two rounds of AES instead of one).The attack can proceed by guessing enough bits of S 0 , but becomes more difficult.
In our examples, the choice of the shifts was done by hand, trying to obtain the simplest equation system.More clever choices might still exist.Conversely, making the schemes secure against this attack means ensuring that none of the equation systems resulting from a combination of ciphertexts and shifts can be tractable.
Previously, some Q2 quantum attacks have been linked to more efficient Q1 attacks [BHN + 19].However such methods do not appear to work in this scenario, as classical queries will have different nonces, and cannot be brought together to emulate a single quantum query.To the best of our knowledge, all the schemes studied in this paper remain secure against Q1 attacks.
As a final remark, we note that the attacks presented in this paper have time and query complexities below those of exhaustive search, without taking parallelization into account.However, the generic attacks are instances of quantum search, while the trials in our attacks can be parallelized perfectly.As a consequence, there might exist other targets than those given in this paper, on which an advantage against exhaustive search is reached under some depth constraint.
M g(x)if | g(x)| ≥ M , and otherwise, by 0 (meaning that we eliminate the coordinate).Theorem 1.Let M be a bound and G = #{x, | g(x)| ≥ M }.Define p := GM 2 2 2n and p ′ := G 2 n .Then: • the probability to measure 0 in Step 11 of Algorithm 1 is bigger than p − 2 −n/2+1 ; • the probability to measure s at the end of the algorithm is bigger than pp ′ − 2 −n/2+1 .The algorithm makes one phase query to f , two computations of g, and uses O n 2 additional Toffoli gates.

Figure 4 :
Figure 4: Mapping of known bytes by A.
18) This sub-system in S[0], S[2], S[4] admits on average one solution, and we can solve it in time 2 96 by the following strategy: • Guess two columns and two diagonals of S[0].Obtain two columns of S[4] by the first equation (see Figure 5) • Deduce two columns of S[0] ⊕ S[4].• Obtain two columns and two diagonals of A(S[2]) by the third equation (see Figure 5) • Deduce two diagonals of S[2].• Using the second equation, solve the obtained linear system in the 2 remaining diagonals of S[2]; obtain the whole S[2] (on average one solution) • Using the third equation, obtain S[0].Each S-Box equation of the form SBox( * ⊕ x) ⊕ SBox( * ⊕ x) = * has on average one solution; half of the time they have zero solutions and half of the time, they have two solutions.So, if one of these equations has no solution, we backtrack.

Figure 5 :
Figure 5: Representation of the first (top) and third (bottom) equations in Equation 18, and the bytes that we guessed.(*) denotes a known state.

Figure 6 :
Figure 6: Representation of Equation 19, and the bytes that we guessed.(*) denotes a know state.

T
[0]   from the M 0 shift T[7]   from the M 1 shift, knowingT [0] T [6] from the M 2 shift, knowing T [0, 7] T [5]from the M 3 shift, knowing T [0, 6, 7] T[4]   from the M 4 shift, knowing T [0, 5, 6, 7]Next, we focus on the ciphertext blocks C 1 to C ′ 2 which we have also obtained.Thanks to C ′ 1 and all the state registers that we know, we obtain T [2].Next, thanks to C ′ 2 , we obtain T [1].The only register of T that we are missing is T [3].We can find half of it using the expression of C 1 : indeed, from the known T [i] we can compute AND(T [2], T [3]).We can expect half the bits of T [2] to be one, which gives us the the corresponding bits of T [3].

Table 3 :
Toffoli count of Grover's key search for studied schemes.As the exponent is rounded to the nearest integer, the Toffoli gate counts for some schemes can appear identical even though the number of AES rounds necessary to compute an output differs between them.