Novel Side-Channel Attacks on Quasi-Cyclic Code-Based Cryptography

. Chou suggested a constant-time implementation for quasi-cyclic moderate-density parity-check (QC-MDPC) code-based cryptography to mitigate timing attacks at CHES 2016. This countermeasure was later found to become vulnerable to a diﬀerential power analysis (DPA) in private syndrome computation, as described by Rossi et al. at CHES 2017. The proposed DPA, however, still could not completely recover accurate secret indices, requiring further solving linear equations to obtain entire secret information. In this paper, we propose a multiple-trace attack which enables to completely recover accurate secret indices. We further propose a single-trace attack which can even work when using ephemeral keys or applying Rossi et al. ’s DPA countermeasures. Our experiments show that the BIKE and LEDAcrypt may become vulnerable to our proposed attacks. The experiments are conducted using power consumption traces measured from ChipWhisperer-Lite XMEGA (8-bit processor) and ChipWhisperer UFO STM32F3 (32-bit processor) target boards.


Introduction
The security of public key cryptosystems (PKCs) primarily is based on the difficulty of number theory problems, such as factoring large integers or finding discrete logarithms.Shor, however, proposed an algorithm that can solve such problems in polynomial time, given a practical large-scale quantum computer [Sho94].Since quantum computers become critical threats to the current PKCs, such as Rivest-Shamir-Adleman (RSA) and elliptic curve cryptography (ECC) [RSA78,Mil85,Kob87], there are an increasing needs for post-quantum cryptography (PQC) that is secure against both quantum and classical computers.
The National Security Agency (NSA) thus announced that the list of Suite B cryptographic algorithms would be updated to PQC algorithms [Age15].The National Institute of Standards and Technology (NIST) also released an internal report (NISTIR) 8105: Reports on PQC [CCJ + 16], giving an analysis of the current state of quantum computing and then discussing the need of PQC standardization.In December 2016, the NIST announced a call for proposals for PQC standardization [NIS16].In contrast to the Advanced Encryption Standard (AES) and Secure Hash Algorithm version 3 (SHA-3) competitions, which selected a single algorithm, the NIST aims to recommend several PQC algorithms [NIS97,NIS07].In the first-round submissions, sixty-nine proposals on public key encryption, key establishment, and digital signature algorithms were accepted.In the following second-round, twenty-six candidates have been survived [NIS19], and seven candidates are code-based cryptographic algorithms.
Code-based cryptography is based on coding theory, which aims to detect and correct errors on transmitted data through a noisy channel.McEliece proposed the first codebased PKC based on binary Goppa codes [McE78].The security of the cryptosystem is based on the difficulty of the Syndrome Decoding (SD) problem and the Goppa Code Distinguishing (GCD) problem [BMvT78, FGO + 11].The main drawback of the original McEliece cryptosystem is the large size of its public key.For the 80-bit security level, the public key size of the McEliece cryptosystem requires 460,647 bits, approximately 450 times larger than the one of the RSA cryptosystem.To reduce the public key size, several variants of the McEliece cryptosystem have been proposed, by replacing the Goppa codes of the McEliece cryptosystem with other efficient codes, for example, generalized Reed-Solomon (GRS), low-density parity-check (LDPC), and moderate-density parity-check (MDPC) codes [Nie86,BS08,MTSB13,BCS13,Cho16,Cho17].
Niederreiter [Nie86] proposed a variation of the McEliece cryptosystem, using a paritycheck matrix as a secret key instead of a generator matrix, and its security has been proven to be equivalent to that of the McEliece cryptosystem [LDW94].Niederreiter used the GRS code, but it can leak information much more easily than the binary Goppa code [SS92].Biswas and Sendrier proposed a hybrid McEliece encryption scheme [BS08], reducing the size of public key by using a row echelon form of a generator matrix.Bernstein et al. proposed a key encapsulation mechanism (KEM)/data encapsulation mechanism (DEM) called McBits [BCS13], using the Niederreiter cryptosystems as the underlying scheme.McBits provides an additive fast Fourier transform (FFT) decoding algorithm for fast root and syndrome computations, and also a countermeasure against cache-timing attacks by utilizing a constant-time implementation.Chou further improved performance of McBits [Cho17] with a constant-time implementation for key generation and encryption and internal parallelism for decryption.
The MDPC and quasi-cyclic MDPC (QC-MDPC) codes, in particular, have recently received extensive attention due to the smaller key sizes and efficiency in terms of computational complexity.Misoczki et al. [MTSB12] proposed two variants of the McEliece cryptosystem called MDPC-McEliece, employing the MDPC and QC-MDPC codes to realize smaller key size.For the 80-bit security level, the public key of QC-MDPC McEliece requires only 4801 bits.Chou proposed a variant of the hybrid (KEM/DEM) Niederreiter encryption scheme using QC-MDPC codes called QcBits [Cho16].
Kocher first presented side-channel attacks (SCAs) [Koc96], which enable to recover secret credentials, i.e. cryptographic keys, by analyzing side-channel information such as execution time, power consumption, electromagnetic emission, and photonic emission, when cryptographic algorithms are running on devices.Such side-channel attacks include timing attack (TA), simple power analysis (SPA), differential power analysis (DPA), correlation power analysis (CPA), and profiling attack [MOP07].Strenzke et al. [STM + 08] first proposed a SCA against the McEliece cryptosystem.They presented a TA on the degree of error locator polynomial in the Patterson algorithm [Pat75] exploiting the fact that the difference in computation time depends on the polynomial degree.Other types of TAs against the McEliece have been followed as in [SSMS09,Str10,Str11,Str13,BCDR17].Strenzke et al. [STM + 08] also described power analysis against the parity-check matrix of McEliece key generation.Various SPAs and DPAs against the McEliece cryptosystem can be found in [HMP10, MSSS11, vMG14b, CEvMS15a, PRD + 15, PRD + 16, FGH16], and fault injection attacks in [CD10,Str11].
There have been, in particular, several SCA results against cryptosystems based on QC-MDPC code.Chou suggested a constant-time implementation for QC-MDPC codebased cryptography to mitigate TAs [Cho16].This countermeasure was later found to become vulnerable to a DPA in private syndrome computation, as described by Rossi et al. [RHHM17].The proposed attack, however, still could not completely recover accurate secret indices, requiring further solving linear equations to obtain entire secret information.
Our Contributions.The main contributions of this paper can be summarized as below.

Enhancing existing multiple-trace attack on TA countermeasure
We propose a multiple-trace attack on the constant-time multiplication introduced by Chou for secure syndrome computation [Cho16].Contrary to the attack results by Rossi et al. [RHHM17], we demonstrate that our attack recovers entire secret indices using multiple traces, eliminating the need of additionally solving linear equations.Previously, it was not even feasible to solve such equations with target cryptosystems running in 64-bit processor devices.

Proposing a novel single-trace attack
The proposed single-trace attack allows to recover secret indices even when using ephemeral keys or DPA countermeasures [RHHM17,CEvMS15b].In particular, if a processor only provides single bit shift instructions, it is possible to find the whole bits of secret indices.Furthermore, even if processors do not provide single bit shift instructions, we can extract substantial parts of secret indices.The proposed attack exploits the fact that rotation is always carried out, and also that the mask value as determined by the value of the secret bit is used to obtain accurate results.Hence, our attack can make the latest countermeasures proposed for secure private syndrome computation obsolete.

Case study: NIST round 2 QC code-based cryptography
The BIKE and LEDAcrypt are constructed using QC-MDPC and QC-LDPC codes, respectively, and they are the second-round candidates of the NIST PQC standardization.Since syndrome computations of these two schemes were not designed to resist SCAs, we assume that the countermeasures [Cho16, RHHM17, CEvMS15b] are applied to remove each of TA and DPA vulnerability.Our experiment results show that these two schemes may become vulnerable to the proposed multiple/single-trace attacks when they use long-term key pairs.These schemes may become vulnerable to our single-trace attack even when using ephemeral keys.
The experiment makes use of power consumption traces measured from ChipWhisperer-Lite XMEGA (8-bit processor) and ChipWhisperer UFO STM32F3 (32-bit processor) target boards [Inca, Incb].The proposed multiple-trace attack with 50 traces collected at 7.38 MS/s sampling rate allows to recover the whole bits of secret indices.The experiment, only using single-trace, describes how to recover the whole bits of secret indices with a processors providing single bit shift instruction, and also to extract substantial part of secret indices even with a processor not providing such single bit shift instructions.An attack flow chart can be found as in Figure 1.
Organization.The rest of this paper is organized as follows.In Section 2, we briefly describe the basics of coding theory and describe the literature for SCAs on QC-MDPC code-based cryptography.In Section 3, we then explain our target algorithm and recent DPA results.In Section 4 and Section 5, we propose a multiple-trace attack and a singletrace attack with experiment results.We then describe how the proposed attacks could be

Basics of Coding Theory
Code-based cryptography is based on the decoding problem of random error-correcting codes, i.e.Syndrome Decoding, which is known to be NP-hard [BMvT78].In other words, it is based on the difficulty of finding the closest codeword x to a given y ∈ F n q , assuming that there is a unique closest codeword.Table 1 shows the definition of notations used in this paper.

Definition 4. [Hamming Weight]
The Hamming weight of a vector x ∈ F n q is the number of all its non-zero components;

Definition 5. [Hamming Distance]
The Hamming distance between two vectors x, y ∈ F n q is the number of components in which they differ;

Definition 6. [Minimum Distance]
The minimum distance of a linear code C is The minimum distance gives the smallest number of errors needed to change one codeword into another.Therefore, it induces the error correction capability of the linear code C. If the code can correct up to t errors, and changes are made at t or fewer places in a codeword c, then the closest codeword is still c.In coding theory, if the errors occur less than half the d(C), it could be corrected.Namely, the linear code C can correct up to t errors if d(C) ≥ 2t + 1.

Definition 7. [Circulant Matrix]
An r × r matrix is a circulant matrix if its rows are successive cyclic shifts of its first one.The top row (or the leftmost column) of a circulant matrix is the generator of the circulant matrix.
Definition 9. [Quasi-Cyclic Code] An (n, k)-linear code C of length n = n 0 r, dimension k = k 0 r, and co-dimension r = n − k is a QC code if every cyclic shift of a codeword by r positions results in another codeword in C. The n 0 is called the index.
There is a ring isomorphism denoted as ϕ between the r × r circulant matrices and the quotient polynomial ring R = F q [x]/ x r − 1 .Thus, a circulant matrix A whose first row is (a 0 , • • • , a r−1 ) is mapped to the polynomial ϕ(A) = a 0 + a 1 x + • • • + a r−1 x r−1 , and the (n, k)-QC code can be viewed as a cyclic code over the ring R = F q [x]/ x r − 1 .
Definition 10. [QC-MDPC/LDPC Code] An (n, r, w)-linear code C of length n = n 0 r, dimension k = k 0 r, and co-dimension r = n − k admitting a parity-check matrix H with constant row weight w is a QC-LDPC or MDPC code.LDPC and MDPC codes only differ in the row weight w.If the code is defined by a parity-check matrix H with constant row weight w = O( nlog(n)), it is a MDPC code; while LDPC codes have small constant row weights, usually less than 10.The parity-check matrix H of QC-MDPC code is (n − k) × n QC matrix.

Side-Channel Attacks on QC-MDPC Code-Based Cryptography
Maurich et al. [vMG14b] presented SPAs on QC-MDPC McEliece cryptosystem, i.e. a message recovery attack and a private key recovery attack.For a private key recovery, they exploited the fact that different patterns of power consumption are observed depending on whether the conditional branch instruction is executed or not, when generating the next row of the private key in the syndrome computation.They presented experiment results based on two types of software implementations, i.e.AVR and ARM.They also proposed a constant-time implementation as a countermeasure using the ARM Thumb-2 assembly language.More specifically, they adopted the mask value, which is either zero or all bits are 1, and the logical AND instruction to choose which data to use.We classify the property used in the attacks as follows.

Property 1. If an algorithm behaves irregularly according to the secret value, then the algorithm is vulnerable to simple power analyses (or timing attacks).
Chen et al. [CEvMS15a] presented a horizontal DPA on the QC-MDPC McEliece cryptosystem, which is a private key recovery attack on the asymmetric decryption algorithm using the chosen ciphertexts.They successfully recovered substantial parts of the private key by a DPA during syndrome computation and key rotation.They make use of the public key to recover the whole private key or to correct remaining errors using an algebraic step.Their attack target was the field programmable gate array (FPGA) implementation presented at DATE 2014 [vMG14a].Since hardware implementations operate in parallel, they applied the chosen ciphertext DPA.They also suggested a threshold implementation based on boolean masking as a countermeasure [CEvMS15b].The further analysis and countermeasure are also proposed [CEvMS16].We classify the property used in the attacks as follows.
Property 2. If an algorithm uses a fixed secret k, and if it is possible to calculate hypothetical intermediate states v i,j = f (d i , k j ) for all D known values d i and for all K candidates k j of k, then the algorithm is vulnerable to differential power analyses (or correlation power analyses).At this time, K should be small enough so that all hypotheses v i,j can be exhausted.
Chaulet et al. [CS16] discussed that variable time decoders, such as bit flipping (BF) algorithm, may leak partial information.Since the number of iterations of the algorithm depends on the error pattern as well as the parity-check matrix, the algorithm may leak information about a private key and consequently allow a successful TA.They thus proposed minimizing the number of iterations by adapting threshold values as a function of the syndrome weight to make a constant-time decoder.This attack is based on Property 1.

QcBits: Constant-Time Implementation of QC-MDPC Decoding
QcBits, proposed by Chou [Cho16], is the constant-time implementation of QC-MDPC code-based cryptography to mitigate TAs.
and row weight of w.QcBits uses r = 4801, w = 90, and t = 80 for the 80-bit security. .
Since H 0 and H 1 are circulant matrices, i.e.
, where k = 0, 1 and 0 ≤ i, j ≤ r − 1; the first row of H can represent the whole matrix.An array of indices in

is calculated by
can be considered to be a polynomial, and R i (c (k) ) can be calculated by the multiplication For the remainder of this paper, c (k) is considered as a polynomial element.Here, the parity-check matrix H is the private key.Chou then suggested a constant-time multiplication x d c (k) , as shown in the Algorithm 1.This allows secure private syndrome computation Hc as a countermeasure against a TA.
Since 0 , where l = log 2 (r − 1) .Then, one can calculate rotated intermediate values using W -bit word unit rotation for d j from d l−1 to d log 2 W .A rotation by 2 j -bit, i.e. a power-of-2 shifts operation, can be calculated by a rotation by 2 j−log 2 W words unit, where l−1 ≤ j ≤ log 2 W .To make it perform in constant-time, the rotation is always carried out independent of d j value.Then, one of the unrotated vector and the rotated vector is chosen according to the d j value.The author implemented the algorithm using the mask value, which is zero when d i = 0, and all the bits are 1 when d i = 1.The ¬mask value is also used, where ¬ refers to negation.Therefore, the result is obtained as follows: The Algorithm 1 shows a simplified algorithm scheme, detailed algorithm scheme and toy example are shown in Appendix A. When the bit length of the word W is 8, the mask value is as shown below: The variable us indicates how many words are rotated to the left.Thus, the result w is given by: w Algorithm 1 Constant-Time Multiplication in F 2 [x]/ x r − 1 (refer to [Cho16]) word unit rotation is from 2 to 13 3: end for 10: for j = 1 up to us do end for 13: end for Return w for j from 0 to L − 1, where L = r/W .Based on the published source code of [Cho16], Equation (1) is calculated by dividing into two parts: steps 7 to 9 and steps 10 to 12 of the Algorithm 1.In those parts, one selects the rotated value v[j + us] when d i = 1 and the unrotated value v[j] when d i = 0.
A sequence of logical instructions is utilized for j from log 2 W − 1 to 0, i.e. the shifts inside the units, as shown in steps 14 to 22 of the Algorithm 1.In this paper, we defined the power-of-2 shifts operation for ( ) as a word unit rotation and the shifts operation inside the units for (d

A Side-Channel Assisted Cryptanalytic Attack on QcBits
Rossi et al. [RHHM17] proposed a DPA on QcBits with the Property 2 described in Section 2, targeting private syndrome computation, i.e. constant-time multiplication x d c (k) .The authors analyzed power consumption traces acquired while the results of x d c (k) were stored in memory.In software implementations, the power consumption is affected by the Hamming weight of the intermediate value.They thus applied the DPA based on the leftmost bit of each rotated result x d c (k) calculated by estimating d.Since from the (i + 1) to (i + W )-th bits are saved into the same register, there will be W candidates for d.Hence, it is impossible to find accurate secret indices, merely reducing the candidates.For each secret index d, there are 8 candidates in an 8-bit processor and 64 candidates in a 64-bit processor.For the full recovery of the secret indices, it is thus required to solve linear equations.As shown in the Table 2, for 128-bit security, the linear equations can be solved within a reasonable time on 8-bit, 16-bit, and 32-bit processors.It is, however, not feasible on 64-bit processors.The authors also proposed a codeword masking, i.e. adding a random codeword prior to the syndrome computation.The result of the syndrome calculation remains unchanged due to the fact that QC-MDPC codes are linear, and they discussed that the proposed countermeasure effectively removes the information leak against DPAs. (2) Similarly, as a DPA countermeasure, Chen et al. [CEvMS15b] proposed a masked syndrome computation, splitting H into two shares H m and M , where M is a matrix for masking, as below.
In the subsequent sections, we propose multiple-and single-trace attacks on the consta nt-time multiplication x d c (k) which entirely recover the secret index d eliminating the need of solving linear equations.If a processor only provides single bit shift instructions, it is possible to find the accurate secret index d using a single trace even when DPA countermeasures, e.g.Equation (2) and Equation (3), are applied.If a processor provides multiple bit shift operations, W candidates for d will be derived, similarly to the results obtained in [RHHM17].

Proposed Multiple-Trace Attack on Constant-Time Multiplication for Syndrome Computation
In this section, we propose a multiple-trace attack on the constant-time multiplication x d c (k) .We show that it is possible to completely recover secret indices using multiple traces.
In contrast to the attack presented in Subsection 3.2 that has W candidates for each d, our attack can extract the entire secret index d = (d l−1 , d l−2 , • • • , d 0 ) 2 ; solving linear equations is not required anymore.Based on the structure of the constant-time multiplication shown in the Algorithm 1, we divide the attack position into two parts to find d: the word unit rotation to find ( , and the bit rotation to find (d Since software implementation is considered here, the power consumption is assumed to be affected by the Hamming weight of the intermediate value [MOP07].

Multiple-Trace Attack on the Word Unit Rotation
We here describe the multiple-trace attack methodology on the word unit rotation and present our experiment results.For the attack methodology construction, we first categorize properties of the word unit rotation.We then describe how to find Attack Methodology.The following operation is executed to mitigate TAs as described in Subsection 3.1.
Accordingly, the v[j + us] and v[j] values are always loaded.In this step, one selects the rotated value v[j + us] when d i = 1 and the unrotated value v[j] when d i = 0. Therefore, when the bit length of the word W is 8, This is shown in the steps 7 to 9 of the Algorithm 1.The index of the array v[ * ] to be saved in w[j] is determined based on the d i value.We thus define the following two properties.
New Property 1.The mask value is 0 − d i ; therefore, it is 0x00 when d i = 0. Consequently, in the steps 7 to 9 of the Algorithm 1, it is on an 8-bit processor, i.e. v[j] is saved to w[j].Thus, v[j] is loaded and saved, but v[j + us] is only loaded.Contrariwise, when d i = 1, the mask value is 0xff on an 8-bit processor.Consequently, in the steps 7 to 9 of the Algorithm 1, it is which has the same index.Contrariwise, when d i = 1, the rotated value is chosen, i.e. v[j + us] is saved to w[j], which has a different index.
During the algorithm execution, the specific power consumption pattern can be observed depending on the intermediate values.Thus, the power consumption P total at each point can be modeled as the sum of a data-dependent component P data and Gaussian noise P noise , i.e.P total = P data + P noise [MOP07].Since we assume that the Hamming weight of the intermediate value contributes to the power consumption, we can remodel P total as • wt(data) + P noise , where is a constant, i.e. there is a linear relationship between P total and wt(data).It is thus possible to specify the positions where the v[j] value was used by calculating the Pearson correlation coefficient between the Hamming weight of the v[j] values and power consumption traces.That is, the positions with high correlations are related to the operation using the intermediate value v [j].
If d i = 0, then the mask value is 0x00; therefore, the power consumption with respect to the v[j] value occurs sequentially twice in the steps 7 to 9 of the Algorithm 1 according to the New Property 1. Contrariwise, when d i = 1, the mask value is 0xff on an 8-bit processor; therefore, the power consumption with respect to the v[j] value occurs once in the steps 7 to 9 of the Algorithm 1.Thus, one can find d i by identifying whether a high correlation occurs sequentially twice in the steps 7 to 9 of the Algorithm 1.
The value w[j] based on d i+1 is the same with the value v[j] based on d i in the steps 2 to 13 of the Algorithm 1, where log 2 W ≤ i < l − 1.Thus, the power consumption related to the v[j] value occurs sequentially twice in the (j + 1)-th iteration of the steps 7 to 9 of d i+1 .Besides, based on the New Property 1 and the New Property 2, if d i = 0, then the power consumption related to the v[j] value occurs sequentially twice in the (j + 1)-th iteration of the steps 7 to 9 of d i .Consequently, the power consumption associated with the v[j] value occurs sequentially twice in the same iteration where the loaded and saved operations are executed according to the prior key bits d i+1 when d i = 0.

Algorithm 2 Multiple-Trace Attack on the Word Unit Rotation
if the high correlation occurs twice at the 1st iteration then finding d l−1 3: if the high correlation occurs twice at the same position (iteration) with d i+1 then 9: else 11: end if 13: end for 14: Otherwise, the power consumption related to the v[j] value occurs sequentially twice in a different iteration from where the loaded and saved operations are executed based on the prior key bits d i+1 when d i = 1.In other words, two peaks will occur at 2 i−log 2 W words left-rotated position, i.e. ((j − 2 i−log 2 W ) mod L + 1)-th iteration.Thus, one can find d i by identifying whether a high correlation occurs sequentially twice in the same iteration as d i+1 .In other words, one can find d i by identifying the position where the high correlation with the intermediate value v[j] occurs sequentially twice.
It is possible to find (d l−1 , d l−2 , • • • , d log 2 W ) using only the New Property 1.However, one has to chase the intermediate value determined by d i+1 when it is desired to find d i , whereas, there is no need to chase the intermediate value if one make use of the New Property 2. Since the most significant bit can only be recovered based on the New Property 1, we combine the New Property 1 and the New Property 2 to construct an attack methodology.
Therefore, we recover the most significant bit d l−1 based on the New Property 1. Afterward, we recover the following bits (d l−2 , • • • , d log 2 W ) based on the New Property 2. Hence, we construct the attack flow, such as the Algorithm 2. In the step 1 of the Algorithm 2, T is a set of N traces and C 0 is a set of N input words c i [0], where each multiplication input value Consequently, it is possible to identify the position where the power consumption related to the set C 0 of input words occurs sequentially twice by calculating the correlation coefficient between T and C 0 ; the correlation coefficient represents the similarity between the two sets.Subsequently, the most significant bit d l−1 can be recovered.Similarly, by finding whether the power consumption related to the set C 0 occurs sequentially twice at the same position as d i+1 or not, we can recover In the Algorithm 2, the steps 2 to 6 for finding d l−1 are based on the New Property 1, and the steps 7 to 13 for finding (d l−2 , • • • , d log 2 W ) are based on the New Property 2. Thus, ) is extracted by the multiple-trace attack on the word unit rotation.We describe the attack methodology to find the last log 2 W -bit, (d Experiment results on an 8-bit processor.Our experiment shows that the position with high correlation with the intermediate value v[0], i.e. each multiplication input, is different depending on the secret bit d i value.Since the target board is equipped with an 8-bit processor, we set r = 256 as a toy example.Thus, when W is 8, L = r/W = 32, l = log 2 (r − 1) = 8, and d = (d 7 , • • • , d 1 , d 0 ) 2 .We measured 500 power consumption traces at 7.38 MS/s sampling rate with the Algorithm 1 operating on a ChipWhisperer-Lite XMEGA target board.Figure 2   Figure 3 shows the experimental proof of the attack methodology to find the most significant bit d l−1 based on the New Property 1.The power consumption with respect to C 0 occurs sequentially twice in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 0, as shown in Figure 3(a).Since L = 32, l = 8, and log 2 W = 3, the steps 7 to 9 of the Algorithm 1 operate 16 times when d 7 .Thus, the first 16 patterns in Figure 3   Experiment results on a 32-bit processor.The target board is equipped with a 32-bit processor, and we set the parameter r = 4801 for the 80-bit security.Since r is 4801 and

Multiple-Trace Attack on the Bit Rotation
We now describe the multiple-trace attack methodology on the bit rotation and present our experiment results.To recover the remaining bits (d ) can be found as described in Subsection 4.1.Thus, we only guess the low value from 0 to W − 1 when we guess the leftmost word of the result of the bit rotation x d c (k) of the Algorithm 1.At this point the low value and the last log 2 W -bit value of d are the same.After the bit rotation is performed, the result of Equation (4) is saved.Besides, after the Algorithm 1 execution, the result for calculating H 0 c (0) or H 1 c (1) is accumulated.Therefore, the result of Equation ( 4) is loaded.Accordingly, the power consumption associated with this intermediate value occurs in two places.We refer to these points as points of interest (PoI).Further, we mount a CPA using these two PoIs and find the last Experiment results on an 8-bit processor.Here, we demonstrate that the last 3-bit of d can be found by an 8-bit CPA.Using the 8-bit CPA implies that we only use 8-bit of intermediate data, which does not denote the attack complexity.The attack complexity is 2 3 when we target an 8-bit processor since we only need to find the last log 2 8 = 3-bit of d.It is 2 6 when we target a 64-bit processor since one has to find the last log 2 64 = 6-bit of d.Thus, the attack is feasible.The measurement setup for power consumption traces is as described in Subsection 4.1, and two PoIs can be identified, as shown in Figure 5  An interested reader may refer to Appendix B.2 for a more detailed explanation.

Comparison with the Previous Attack
As described in Subsection 3.2, the attack suggested by Rossi et al. reduces to a certain number of candidates for each d but still requires to solve linear equations.Such a computation is even not feasible for 64-bit processors.Our multiple-trace attack, however, allows to recover all secret indices regardless of word size and security level.This becomes possible because we divided the attack position into two parts based on the structure of the constant-time multiplication.In particular, we categorize new properties of the word unit rotation and only need to guess the last log 2 W -bit of d when we attack the bit rotation.Moreover, to the best of our knowledge, the attack presented in Subsection 3.2 requires approximately 200 power traces sampled at 96 MS/s; however, our attack only requires 50 traces sampled at 7.38 MS/s.We measured the power consumption traces from the same target board as used in Subsection 3.2, i.e.ChipWhipserer-Lite XMEGA.

Proposed Single-Trace Attack on Constant-Time Multiplication for Syndrome Computation
In this section, we propose a single-trace attack on the constant-time multiplication x d c (k) .We demonstrate that the proposed single-trace attack allows to extract the secret

Single-Trace Attack on the Word Unit Rotation
We here describe the single-trace attack methodology on the word unit rotation and present the experiment results.To construct the attack methodology, we first categorize properties of the word unit rotation.We then propose how to find ( based on these properties.
Attack Methodology.As described in Subsection 3.1, the mask value determined by the value d i is used to check whether the rotated value is saved or not.Therefore, there exists a phase in extracting and saving it before performing the word unit rotation, such as in the step 3 of the Algorithm 1.Then, the values mask and ¬mask are computed and saved.Besides, when the steps 7 to 9 of the Algorithm 1 are executed, the values mask and ¬mask are loaded.Since, in software implementations, the power consumption depends on the Hamming weight of the intermediate value, it is possible to distinguish among the steps mentioned above and classify the power consumption properties of the target Algorithm 1 as follows: & ¬mask) mask and ¬mask are loaded.
New Property 3. The secret bit d i is 0 or 1.Thus, if d i = 0, the power consumption is associated with 0 when extracting and saving the d i value.Likewise, if d i = 1, then the power consumption is associated with 1.
New Property 4. The mask value is 0 − d i ; therefore, it is 0x00 when d i = 0, and the power consumption is related to 0. Contrariwise, when d i = 1, the mask value is 0xff on an 8-bit processor, and the power consumption is related to 8, which is the Hamming weight of the mask value.
New Property 5.The ¬mask value is 1's-complement of the mask value; therefore, it is the bitwise inversion value of the mask value.Consequently, in contrast to the New Property 4, it is 0xff on an 8-bit processor when d i = 0 and the power consumption is related to 8. Contrariwise, the power consumption is related to 0 when d i = 1.
We define d i , mask, and ¬mask as the reference values for each property.Additionally, we define the New Property 3, the New Property 4, and the New Property 5 as key bit-dependent properties.Based on these key bit-dependent properties, power consumption traces can be classified into two groups, G 1 and G 2 , depending on the d i value.The Algorithm 3 Single-Trace Attack on the Word Unit Rotation Select points of interest p i of word unit rotation operation associated with d i 3: end for 4: Classify p i into two groups, G 1 and G 2 , using the k-means clustering algorithm 5: Calculate the average values AV G 1 and AV G 2 , respectively, of G 1 and G 2 6: for i = l − 1 down to log 2 W do 7: clustering algorithms, such as k-means, fuzzy k-means, or EM algorithms, then can be applied [Anz92].
After clustering, the average values AV G 1 and AV G 2 of each group G 1 and G 2 , respectively, are calculated.Assuming that larger the Hamming weight requires the lower the power consumption, if AV G 1 is lower than AV G 2 , d i belonging to G 1 is 1 and that belonging to G 2 is 0, based on the New Property 3. The same results are obtained when we classify based on the New Property 4. In contrast, d i belonging to G 1 is 0, and d i belonging to G 2 is 1 when we classify based on the New Property 5. Hence, ( can be recovered by identifying the group that d i belongs to.We further describe how to find the last log 2 W -bit, i.e. (d log 2 W −1 , • • • , d 1 , d 0 ), in the following subsection.
The Algorithm 3 describes the attack flow.In the steps 1 to 3 of the Algorithm 3, we choose the PoIs associated with one of s.1, s.2, s.3, and s.4.The step 4 is the classification of the PoIs into two groups using the k-means clustering algorithm.Since G 1 and G 2 differ in the intermediate values that affect the power consumption, the distribution of G 1 is different from that of G 2 .We thus can distinguish which group is associated with a certain intermediate value using the average values of each group, as shown in the steps 7 to 11 of the Algorithm 3. The reader may refer to experiment results for a detailed explanation.
Experiment results on an 8-bit processor.The experiment result demonstrate that the key bit-dependent properties are enough to extract the secret bit, d i , using a single trace.The measurement setup for power consumption traces can be found in Section 4.1.
The PoIs can be identified by calculating the sum of squared pairwise t-differences (SOST) [GLP06] of the traces and then identifying the location of the information-leaking point, as shown in Figure 6.The SOST of two groups, G 1 and G 2 , is calculated as below.
, and # denote the mean, standard deviation, and number of elements, respectively.In Figure 6, the five points with the high SOST values are where the com operation, which yields a 1's complement to calculate the ¬mask value, is performed.Figure 7 shows the distribution of points which have the highest SOST value, near 685 points of Figure 6.Two distributions are clearly distinguished: one is when d i = 0, and the other is when We use five points with the high SOST value as PoIs and select these points in the steps 1 to 3 of the Algorithm 3. The average value for d i = 0 is higher than the average value for d i = 1.Therefore, if the AV G 2 is higher than the AV G 1 and d i belongs to G 2 , then d i is 0. In contrast, if d i belongs to G 1 , it is 1. Figure 8 shows the attack results.Hence, the accurate secret bits (d 7 , d 6 , d 5 , d 4 , d 3 ) of indices can be found using the Algorithm 3.
Experiment results on a 32-bit processor.Since W = 32, the mask value is 0x00000000 or 0xffffffff.One can find (d 12 , d 11 , d 10 , d 9 , d 8 , d 7 , d 6 , d 5 ) because log 2 W = 5.The reader may refer to Appendix B.3 for the details.

Single-Trace Attack on the Bit Rotation
In this subsection, we describe the single-trace attack methodology on the bit rotation and discuss the results of the experiment.To recover the remaining bits, (d we apply a SPA based on the Property 1 presented in Section 2. Attack Methodology.The most commonly used 8-bit AVR and 16-bit MSP430 processors only provide single bit shift instructions.Thus, a 1-bit right shift operation is repeated low times, and a 1-bit left shift operation is repeated high times in the steps 17 to 22 of the Algorithm 1.A SPA thus allows to identify the number of 1-bit left shift operations.Since the low value and the last log 2 W -bit value of d are the same, the remaining bits (d log 2 W −1 , • • • , d 1 , d 0 ) can be identified.Since the most commonly used 32-bit and 64-bit processors support a barrel shifter, i.e. multiple bit shifts are performed within a single clock cycle, it is difficult to identify the last log 2 W -bit of d.Thus, W candidates remain, requiring to recover accurate indices with additional algebraic computations, similar as discussed in [RHHM17].It is still possible to extract the substantial parts of the secret indices using only a single trace.
Experiment results on an 8-bit processor.The experiment shows that the low value can be recovered when a processor provides 1-bit shift operations.The measurement setup for power consumption traces is as described in Section 4.1.Vertical dot lines in Figure 9 1 indicate the endpoint of each bit rotation of one word, and the endpoint is the same regardless of the index value.This is because the total number of 1-bit right and left shift operations is always the same.As a result, the bit rotation performs in constant-time.Through assembly analysis, we verified that the compiler handles the variable shift using iterative procedures with the number of repetitions as a variable when it provides a 1-bit shift operation.Thus, the 1-bit right shift operation does not occur when the last 3-bit of d is 0, and the 1-bit right shift operation is performed 7 times when the last 3-bit of d is 7 as shown in Figure 9. Hence, even if ephemeral keys are used or randomization countermeasures to DPAs [RHHM17,CEvMS15b] are applied, it is possible to recover the secret bits (d 2 , d 1 , d 0 ) of d using a single trace.
Experiment results on a 32-bit processor.Since L = 151, the steps 17 to 19 of the Algorithm 4 in Appendix A operate from j = 0 to j = 148.Therefore, we can identify 149 patterns indexed from 0 to 148 in the bit rotation, as shown in Figure 10 1 .Unlike the results on an 8-bit processor, it is impossible to distinguish how many single shift operations are performed.This is because our target 32-bit processor STM32F3 provides the barrel shifter.Thus, in this case, W candidates would be left, and we need to solve some linear equations to find accurate indices, similar as discussed in [RHHM17].
Remark.In our targeted platforms, the condition of constant-time was met.Furthermore, in this paper, we do not consider possible problems depending on compile options.We posted scripts for the proposed attacks online2 .Using Ephemeral Key Pairs.LEDAcrypt KEM using ephemeral key pairs inherently provides resistance against multiple-trace attacks.Notwithstanding, it would be still vulnerable to TAs in private syndrome computation, as shown in [vMG14b].Thus, adopting the constant-time multiplication as a countermeasure might be considered.However, this countermeasure is still vulnerable to our proposed single-trace attack.Therefore, we can also derive not only secret L = HQ using our single-trace attack but also the secret message.4.

Case Study: BIKE
Using Long-Term Key Pairs.All the IND-CPA variants of BIKE using long-term key pairs cannot guarantee resistance against SCAs in private syndrome computations, as mentioned in Subsection 6.1.Therefore, similar to LEDAcrypt, our proposed multipleand single-trace attacks could be applied.In the case of BIKE-1, we can find H with our proposed attacks during syndrome computation, whereas in the case of BIKE-2 and 3, we can find H 0 ; then it is possible to calculate H 1 using the recovered H 0 and the public key F (see Table 4).The secret message from the received vector can also be extracted using BF decoding and the recovered H.
Using Ephemeral Key Pairs.All the IND-CCA variants of BIKE using ephemeral key pairs also inherently provide resistance against multiple-trace attacks.However, as described in Subsection 6.1, our single-trace attack can be applied.Accordingly, not only H but also the secret message can be retrieved.

Table 4: Keys and syndromes of BIKE Public key
Private key Syndrome

Conclusion
We proposed a multiple-trace attack which enables to completely recover accurate secret indices, and also a single-trace attack which can even work when using ephemeral keys or applying existing DPA countermeasures.We also discussed that the BIKE and LEDAcrypt become vulnerable to our proposed attacks.
The proposed multiple-trace attack can be prevented by applying randomization countermeasures, such as intermediate data masking [RHHM17,CEvMS15b], prior to syndrome computations.As for the single-trace attack, the hiding methods, such as random noise and dummy operation, can be applied to increase attack complexity.It would be one of the interesting future research topics to construct theoretically-sound countermeasure against the single-trace attack proposed in this paper.

A Constant-Time Multiplication
The Algorithm 4 is a detailed algorithm scheme for the Algorithm 1.
Thirdly, the multiplication with x 2 3 can be obtained by 1-byte left rotations.However, the d 3 is 0, so the unrotated value is saved.Lastly, the multiplication with x (011)2 can be acquired by the sequence of logical instructions which combines the most significant 5-bit of v[i] and the least significant 3-bit

B Experiment Results on a 32-bit Processor B.1 Multiple-Trace Attack on the Word Unit Rotation
We measured 500 power consumption traces at 7.38 MS/s sampling rate when the Algorithm 4 in Appendix A is operating on a ChipWhisperer UFO STM32F3 target board.
Figure 11 shows one of the power consumption traces.Novel Side-Channel Attacks on Quasi-Cyclic Code-Based Cryptography select these points in the steps 1 to 3 of the Algorithm 3. Since we target the points where the ¬mask is loaded, the average value when d i = 0 is less that when d i = 1 (see Figure 17).Therefore, if the AV G 2 is less than the AV G 1 and d i belongs to G 2 , then d i is 0. In contrast, if d i belongs to G 1 , it is 1. Figure 8 shows the attack results.Hence, the accurate secret bits (d 12 , d 11 , d 10 , d 9 , d 8 , d 7 , d 6 , d 5 ) of indices can be found.

C Multiple-Trace Attack on the Word Unit Rotation Algorithm Version 2
We can construct an attack methodology such as the Algorithm 5 because the word unit rotation is not performed while

Figure 1 :
Figure 1: Flowchart of our proposed attack the k rows of the matrix G span code C. Definition 3. [Parity-Check Matrix] An (n − k) × n matrix H is a parity-check matrix for C if Hc = 0 for all c ∈ C, that is, the codewords are all the vectors in the right null space of H.
shows one of the power consumption traces.Since log 2 W = 3, we can find (d 7 , d 6 , d 5 , d 4 , d 3 ), and 50 traces are sufficient for the attack.

Figure 2 :Figure 3 :
Figure 2: Power consumption trace of the constant-time multiplication (W = 8) Figure3shows the experimental proof of the attack methodology to find the most significant bit d l−1 based on the New Property 1.The power consumption with respect to C 0 occurs sequentially twice in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 0, as shown in Figure3(a).Since L = 32, l = 8, and log 2 W = 3, the steps 7 to 9 of the Algorithm 1 operate 16 times when d 7 .Thus, the first 16 patterns in Figure3(a) and Figure 3(b), marked from 0 to 15, are interesting domains.Contrariwise, the power consumption with respect to C 0 occurs once in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 1, as shown in Figure 3(b).Figure 3(b) and Figure 4 show the attack results of d = (11101010) 2 .Each of the figures is a magnification of the computational portion of the corresponding bit of Figure 2.Figure 3(b) shows that the high correlation occurs only once in the steps 7 to 9 of the Algorithm 1 since d 7 is 1. Figure 4(a) shows that the high correlation occurs sequentially twice at a different position with d 7 , because d 6 is 1.The same results can be observed in Figure 4(b) and Figure 4(d).In contrast, Figure 4(c) shows that the high correlation occurs sequentially twice at the same position with d 5 , because d 4 is 0. Subsequently, one can find the accurate secret bits (d 7 , d 6 , d 5 , d 4 , d 3 ) using the Algorithm 2. The attack methodology for finding the remaining bits (d 2 , d 1 , d 0 ) is described in Subsection 4.2.

Figure 3
Figure3shows the experimental proof of the attack methodology to find the most significant bit d l−1 based on the New Property 1.The power consumption with respect to C 0 occurs sequentially twice in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 0, as shown in Figure3(a).Since L = 32, l = 8, and log 2 W = 3, the steps 7 to 9 of the Algorithm 1 operate 16 times when d 7 .Thus, the first 16 patterns in Figure3(a) and Figure 3(b), marked from 0 to 15, are interesting domains.Contrariwise, the power consumption with respect to C 0 occurs once in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 1, as shown in Figure 3(b).Figure 3(b) and Figure 4 show the attack results of d = (11101010) 2 .Each of the figures is a magnification of the computational portion of the corresponding bit of Figure 2.Figure 3(b) shows that the high correlation occurs only once in the steps 7 to 9 of the Algorithm 1 since d 7 is 1. Figure 4(a) shows that the high correlation occurs sequentially twice at a different position with d 7 , because d 6 is 1.The same results can be observed in Figure 4(b) and Figure 4(d).In contrast, Figure 4(c) shows that the high correlation occurs sequentially twice at the same position with d 5 , because d 4 is 0. Subsequently, one can find the accurate secret bits (d 7 , d 6 , d 5 , d 4 , d 3 ) using the Algorithm 2. The attack methodology for finding the remaining bits (d 2 , d 1 , d 0 ) is described in Subsection 4.2.
Figure3shows the experimental proof of the attack methodology to find the most significant bit d l−1 based on the New Property 1.The power consumption with respect to C 0 occurs sequentially twice in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 0, as shown in Figure3(a).Since L = 32, l = 8, and log 2 W = 3, the steps 7 to 9 of the Algorithm 1 operate 16 times when d 7 .Thus, the first 16 patterns in Figure3(a) and Figure 3(b), marked from 0 to 15, are interesting domains.Contrariwise, the power consumption with respect to C 0 occurs once in the 1st iteration of the steps 7 to 9 of the Algorithm 1 when d 7 = 1, as shown in Figure 3(b).Figure 3(b) and Figure 4 show the attack results of d = (11101010) 2 .Each of the figures is a magnification of the computational portion of the corresponding bit of Figure 2.Figure 3(b) shows that the high correlation occurs only once in the steps 7 to 9 of the Algorithm 1 since d 7 is 1. Figure 4(a) shows that the high correlation occurs sequentially twice at a different position with d 7 , because d 6 is 1.The same results can be observed in Figure 4(b) and Figure 4(d).In contrast, Figure 4(c) shows that the high correlation occurs sequentially twice at the same position with d 5 , because d 4 is 0. Subsequently, one can find the accurate secret bits (d 7 , d 6 , d 5 , d 4 , d 3 ) using the Algorithm 2. The attack methodology for finding the remaining bits (d 2 , d 1 , d 0 ) is described in Subsection 4.2.

Figure 4 :
Figure 4: Finding d from d 6 to d 3 when d = (11101010) 2 (New Property 2) (a).Even if the CPA uses one of these PoIs can accurately derive the last 3-bit of d, i.e. (d 2 , d 1 , d 0 ). Figure 5(b) confirms that 50 traces are sufficient for the attack.

( a )Figure 5 :
Figure 5: Correlation power analysis results when d = (11101010) 2 even when cryptosystems use ephemeral keys, or the DPA countermeasures[RHHM17,CEvMS15b] are applied.Hence, the proposed attack can make the latest countermeasures proposed for secure private syndrome computation obsolete.As done in the proposed multiple-trace attack, we divide the attack position into two parts to find d: the word unit rotation to find (d l−1 , d l−2 , • • • , d log 2 W ) (Subsection 5.1), and the bit rotation to find (d log 2 W −1 , • • • , d 1 , d 0 ) (Subsection 5.2).

Figure 6 :Figure 7 :Figure 8 :
Figure 6: The power consumption trace and the SOST values between two groups, G 1 and G 2 , of each d i (W = 8)

Figure 9 :Figure 10 :
Figure 9: Simple power analysis on the bit rotation (W = 8) BIKE is a suite of KEM algorithms based on QC-MDPC codes [ABB + ].The authors of [ABB + ] present three IND-CPA variants of BIKE, called BIKE-1, BIKE-2, and BIKE-3, which use ephemeral key pairs.BIKE-1, BIKE-2, and BIKE-3 follow the framework of the McEliece cryptosystem, Niederreiter cryptosystem, and Ouroboros, respectively [DGZ17].They also present three indistinguishability under the chosen ciphertext attack (IND-CCA) variants of BIKE, called BIKE-1-CCA, BIKE-2-CCA, and BIKE-3-CCA, designed to use long-term key pairs.To demonstrate the applicability of our proposed attacks, we describe the key pairs and syndromes of BIKE in Table

Figure 12 :
Figure 12: Comparison of correlation coefficient values based on d 12 (New Property 1) Figure13shows the attack results of d = (0101011001101) 2 .Each of the figures is a magnification of the computational portion of the corresponding bit of Figure11.Figure 13(a) shows that the high correlation occurs sequentially twice in the steps 7 to 9 of the Algorithm 4 since d 12 is 0. Figure 13(b) shows that the high correlation occurs sequentially twice at a different position 3 with d 12 , because d 11 is 1.The same results can be observed in Figure 13(d), Figure 13(f), and Figure 13(g).In contrast, Figure 4(c) shows that the high correlation occurs sequentially twice in the same position with d 11 , because d 10 is 0. The same results can be observed in Figure 13(e) and Figure 13(h).Further, 3 Two peaks will occur at 2 i−log 2 W words left-rotated position, i.e. ((j −2 i−log 2 W ) mod L+ l−1 k=i+1 d k )th iteration when W cannot divide r.
one can find the accurate secret bits (d 12 , d 11 , d 10 , d 9 , d 8 , d 7 , d 6 , d 5 ) using the Algorithm 2. The attack methodology for finding the remaining bits (d 4 , d 3 , d 2 , d 1 , d 0 ) is described in Subsection 4.2.

( a )
Points of interest (b) 32-bit CPA results

Table 2 :
Approximate solving times of linear equations according to the operation unit of the processors (in SAGE on one core) 1 when it follows the New Property 4