An Improved Differential Fault Analysis on Block Cipher KLEIN-64

: KLEIN-64 is a lightweight block cipher designed for resource-constrained environment, and it has advantages in software performance and hardware implementation. Recent investigation shows that KLEIN-64 is vulnerable to differential fault attack (DFA). In this paper, an improved DFA is performed to KLEIN-64. It is found that the differential propagation path and the distribution of the S -box can be fully utilized to distinguish the correct and wrong keys when a half-byte fault is injected in the 10 th round. By analyzing the difference matrix before the last round of S -box, the location of fault injection can be limited to a small range. Thus, this improved analysis can greatly improve the attack efficiency. For the best case, the scale of brute-force attack is only 256. While for the worst case, the scale of brute-force attack is far less than 2 32 with another half byte fault injection, and the probability for this case is 1/64. Furthermore, the measures for KLEIN-64 in resisting the improved DFA are proposed.


Introduction
Information security is an important research topic in information field. As an important security technology, encryption has been constantly improved and matured for protecting data. Currently, encryption is closely related to our daily life, such as embedded systems, wireless sensors, RFID and other devices, which are widely used in bank cards, bus cards, access control cards, and etc. These situations require fast information processing. Classical encryption algorithms can be processed in parallel to improve the real-time performance of encryption [Min, Yang and Wang (2019)]. Due to the strict requirements on circuits size, computing power and storage space of cryptographic devices, lightweight block cipher has emerged and developed. Common lightweight cryptographic algorithms include LBlock [Wu and Zhang (2011)], LED [Guo, Peyrin, Poschmann et al. (2011)], PRINCE [Borghoff, Canteaut, Güneysu et al. (2012)]. And there are some recent proposed methods, such as CSL [Lamkuche and Pramod (2020)], SFN [Li, Liu, Zhou et al. (2018)]. A typical lightweight block cipher must address three challenges: minimal overhead, low-power consumption and adequate security capability [Liu, Wang, Chaudhry et al. (2018)]. As a typical lightweight block cipher, KLEIN-64 was proposed four parts: AddRoundKey, SubNibble, RotateNibble and MixNibble.  (3)

Figure 2:
The key schedule algorithm of 64-bit key length The key schedule algorithm is illustrated in Fig. 2. A 64-bit master key is denoted by 0 i k , ..., 7 i k (i is the round counter), and it is divided in two parts.

The proposed DFA on KLEIN-64
Here are 3 assumptions for the proposed DFA: (1) The attacker has the capability of specifying a plaintext for encryption and obtaining the corresponding ciphertext; (2) The attacker can induce a random nibble fault in the 10 th round; (3) The fault location and value are both unknown.

Design inspiration and related modified work
KLEIN-64 is designed by half byte except nonlinear transformation. Other steps are converted in bytes. Gruber et al. [Gruber and Selmke (2019)] have found that the attack on KLEIN-64 is based on one byte fault injection, which will be more convenient in the analysis. However, when a fault is introduced in the intermediate state, four-byte subkeys including 32 bits in the last round need to be exhausted. This paper is to inject a random half byte fault into the 10 th round and exhaust the remaining 8 half bytes in the last round.
In the worst case, the scale of brute-force attack is 2 32 , which is the same as that [Gruber and Selmke (2019)], and the probability of this situation is 1/64. In the best case, the scale of brute-force attack is only 2 8 .

CIPHERTEXT ←MixNibble(STATE)
In encryption, the column mixing transformation is a linear matrix transformation, and the MDS matrix is invertible. It means that A⊕ M(B)=M(M -1 (A) ⊕ B) always holds. Therefore, the encryption process can be changed without changing the encryption results.
As shown in Algorithm 2, the inverse column transformation is first performed on the key of the last round, and then XOR the results of row shift in the last round. Thereafter, column mixing transformation is performed on it. The final encryption results are consistent with the encryption results before modification.  In this attack, it is supposed that a half-byte fault is injected at the register I10 in the 10 th round, and the input difference is f, and the output difference is f * . Pairs (C, C * ) is the correct ciphertext and the fault ciphertext with a same length of 64 bits, respectively. The diffusion process is shown in Fig. 3. The gray part denotes that the value of difference is stable, and the blue part indicates the uncertainty of the difference. If the XOR results of the first bit of the status byte and the first bit of the fault is 0, it means that the difference is 0000; otherwise, it means that the difference is 1011. This difference comes from the column mixing transformation. That is, when a value is multiplied by 2, the result shifts one bit to the left. If the leftmost bit of the value is 1, it will XOR 0X1B after the shift. According to the column mixing transformation of KLEIN-64, the following conclusions can be drawn.
(2) When the fault only occurs in the four lower bits, the whole fault byte is (1|| (3f * )) multiplying 3 or (1||2f * ) multiplying 2, where the four higher bits cannot be determined.

Description of attack
According to Fig. 3, since RK13 can be calculated by K * , the attack is to recover K * . The attack is described in the following.
(1) Collect correct ciphertext. Select plaintext P for encryption and obtain the correct ciphertext C.
(2) Collect fault ciphertext. Encrypt the same plaintext P, and obtain the fault ciphertext C * by injecting a random nibble fault in the 10 th round.
(3) Get the results of inverse column transformation. For each pair (C, C * ), compute X=MC -1 (C) and X * =MC -1 (C * ). (5) Analyze column mixing transformation in the 11 th round. After column mixing transformation in the 11 th round, the difference A1, A3, A5, A7, A9, A11, A13, A15 is 0 or B. From 3.2, it can be determined whether A0, A2, A4, A6, A8, A10, A12, A14 need to be XORed 1 or not.  (a) and (c). Half-byte keys that do not satisfy the equation can be discarded in (a) and (c), and the remaining keys can be exhausted. When another half byte fault is injected, the half byte secret keys that does not satisfy the equation are discarded. In this way, the scale of brute-force attack is far less than 2 32 . (f) Calculate the candidates of nibble K * in (b) and (d). If the difference in (b) and (d) is 0, the exhaustion of guessing the half-byte keys is 2 32 . However, it can be reduced by combining (16) (here, another nibble fault injection is needed), and this happens with a probability of 1/ 64. If the difference value in (b) and (d) is not zero, the input of the S box can be derived from the known difference of the input and the output of S box. The number of elements in the input set is 2 or 4. MC -1 (C) XOR the results of S box and the row shifting, and the candidate values of half-byte keys in (b) and (d) can also be obtained. The exhaustion of it is at least 2 8 . In the difference matrix before the last round of the S-box, it can be found: (i) If it is 0 or b in the 2 nd and 4 th column of the matrix, the fault is possibly injected at I0 10 , I2 10 , I4 10 , I6 10 , I8 10 , I10 10 , I12 10 , or I14 10 . (ii) If it is 0 or b in the 1 th and 3 rd column of the matrix, the fault is possibly injected at I1 10 , I3 10 , I5 10 , I7 10 , I9 10 , I11 10 , I13 10 , or I15 10 . (iii) In the case of (1), according to the coefficient relationship between F0 and F1 in (a), F3 and F5 in (c), the location of fault injection is limited in the part of I0 10 , I2 10 , I12 10 , I14 10 or the part of I4 10 , I6 10 , I8 10 , I10 10 . Finally, the location can be exhausted only for 4 times.

The improvement of KLEIN-64
In the proposed DFA, it takes the column mixing step in KLEIN-64 as a breakthrough to reduce the search space of secret keys, and the non-uniform difference distribution of the S-box also help finding the candidate secret keys. In this section, some improvements are made to resist the proposed DFA from two aspects. With the above operations, it cannot find the rule of reducing the complexity of secret key through the second half byte of each byte state in the improved algorithm.

Improved S-box
KLEIN-64 adopts the S-box with 4 in and 4 out, which is essentially a multi-output function from GF (2) 4 to GF (2) 4 . The difference distribution of the S-box is listed in Tab. 2. From Tab. 2, it can be seen that the number of non-zero in each row is not the same. The non-uniformity of the distribution of the input difference and the output difference leads to the possibility of differential analysis. Here, a new S-box is constructed to improve the capability of resisting DFA. (i) x 4 +x+1, x 4 +x 3 +1, and x 4 +x 3 +x 2 +1 are irreducible polynomials in GF (2 4 ). Take one of them, such as x 4 +x+1, for an example, find the inverse of the corresponding state halfbyte [0, 1, 2, 3, 4, 5, 6, 7, 8, 9 A, B, C, D, E, F] : [0, 9, E, D, B, 7, 6, F, 2, C, 5, A, 4, 3, 8].
(ii) Affine transformation is performed on the result of (i) as follows The affine transformation is expressed as Here, it needs to discuss how to select polynomials v (x), u (x), and m (x) in GF (2 4 ), where m (x) should be a polynomial with the highest degree of 4. m(x)=x 4 +1 is a reducible polynomial with simple form in GF (2 4 ), and u (x) is arbitrarily chosen and it is prime with m (x). Here, u (x)=x 3 +x+1. V (x) is an affine constant to ensure that there is no fixed point or inverse fixed point in the S-box and v (x)=x. Through the above steps, the S-box is calculated by Tab. 3 and its difference distribution is listed in Tab. 4. For this S-box, the capability of resisting DFA essentially depends on the differential distribution and differential uniformity. Since all the number of non-zero in each row are 7, and it is evenly distributed, which can improve the capability of resisting DFA.

Conclusion
In this paper, an improved DFA is successfully launched on KLEIN-64 by half-byte fault injection. In the proposed attack, the fault location can be limited to 4 positions, and half of the whole key can be exhausted by another random half-byte fault, which significantly reduces the exhaustion scale. Another part of half-byte of the key can be guessed by 2 8 at the best case. Even at the worst case, the scale of brute-force attack is far less than 2 32 with another half byte fault injection, and this case happens with a probability of 1/64. In order to enhance the capability of resisting the differential attack, some measures are also proposed in the design of mix nibbles part and S-box in the original algorithm.
Funding Statement: This work was supported in part by project supported by National Natural Science Foundation of China (Grant Nos. U1936115, 61572182).

Conflicts of Interest:
Authors declare that they have no conflicts of interest to report regarding the present study.