Design of Low Complexity Fault Detection Scheme for AES using Composite Field Arithmetic

The Advanced Encryption Standard (AES) is the symmetric cryptography standard that can be used to protect the electronic data. The natural and malicious injected faults may cause confidential information leakage and also reduce its reliability. In this study, we have explained a low complexity fault detection schemes for the AES architecture. The proposed work is low-complexity fault detection schemes using composite fields in polynomial basis for the AES encryption and decryption. These schemes are independent of the existing S-box and inverse Sbox constructed. Here we have developed a new technique for the fault detection of subbyte and inverse subbyte using multiplicative inversion and affine transformation of the S-box and the inverse S-box. These are constructed in S-box and the inverse S-box. So this scheme can be used for the S-boxes and the inverse S-boxes in composite fields subbyte and inverse subbyte and using ROM. The proposed AES Fault detection scheme is coded in VHDL (Very High Speed Integrated Circuits Hardware Description Language), synthesized and simulated using EDA (Electronic Design Automation) tool-XilinxISEVirtex FPGA (http://www.xilinx.com/.). Finally the results are compared with Conventional ROM based subbyte and inverse subbyte to show the significant improvement in its efficiency in terms of path delay, speed and area.


INTRODUCTION
The Advanced Encryption Standard (AES) is the symmetric key cryptography standard that can encrypt and decrypt the electronic data. In encryption, AES accepts a plaintext (which is limited to 128 bits) and a key for generating the ciphertext. The key can be specified to be 128 bits . In AES-128, the ciphertext is generated after 10 cycles of repetition. For encryption, each round, except the final round, consists of four transformations which includes Sub Bytes (which is implemented by 16 S-boxes), Shift Rows, Mix Columns, AddRoundKey. The decryption transformations are the reverse of the encryption transformations which is utilised to obtain original plain text from the cipher text. Among the transformations, the nonlinear ones are the S-boxes in the encryption and the inverse S-boxes in the decryption. It occupies much of the total AES encryption or decryption area.
There exist many schemes for detecting the faults in the AES hardware implementation, see for example (Karri et al., 2002;Rijmen, 2000;Satoh et al., 2001;Satoh et al., 2008;Mozaffari-Kermani and Reyhani-Masoleh, 2008). Among them, the schemes presented in Karri et al. (2001) and Maistri and Laveugle (2008) are independent of the ways the S-box and inverse S-box in the hardware implementation. The fault detection schemes using memories (ROMs) for the S-box and the inverse S-box are there. Further rmore, a fault tolerant scheme which is resistant to fault attacks is presented in Moratelli et al. (2008).
Either the parity-based scheme proposed in Bertoni et al. (2002) or the duplication approach is implemented to protect the combinational logic blocks used in the four transformations of the AES. Moreover, for storing the expanded key and the state matrix, either the Reed-Solomon error correcting code or Hamming code is utilized for protecting the memories. Our proposed scheme is only applied to the S-box and inverse S-box in composite field polynomial basis. While, the scheme presented in Bertoni et al. (2003); Wolkerstorfer et al., (2002) uses memories. But for high performance, using ROMs are not preferable. Thus, for high performance AES, the S-box and the inverse S-box are implemented using logic gates in composite fields (Canright, 2005;Yen and Wu, 2006).
Thus the schemes suitable for the S-box and the inverse S box in composite field implementation are obtained in Kermani and Reyhani-Masoleh (2006) and Mozaffari-Kermani and Reyhani-Masoleh (2008). The approach in Kermani and Reyhani-Masoleh (2006); Karpovsky et al. (2004); Wu and Yen (2006) is based on using the parity-based fault detection method for a specific S-box in composite field and polynomial basis for covering all the single malicious faults. For the multiplicative inversion of the S-box, two specific composite fields are treated. Though the transformation and affine matrices are excluded in this approach. Furthermore, in Cohen (2007) Parhi (2004, 2006), the fault detection scheme for the multiplicative inversion of a S-box in composite field polynomial basis, the systematic method including predicted parities have been used. The transformation matrices are also advised. Finally, in the parity-based approach in Mozaffari- Kermani and Reyhani-Masoleh (2008), through exhaustive search among all the fault detection S-boxes utilizing five predicted parities using polynomial basis, utmost compact one is obtained. The main objective of the work is to obtain low complexity fault detection schemes using composite field and the result is compared with conventional ROM to get efficient path delay, speed and area.

AES encryption:
In this section, we briefly explain about the four transformations used in the AES encryption and decryption (National Institute of Standards and Technologies, 2001). In the AES-128 (128-bit key) transformation implementations, the irreducible polynomial of P(x) = x 8 +x 4 +x 3 +x+1 is used for constructing the binary field GF(2 8 ). Each transformation in every round acts on its 128-bit input denoted as the state. The states are considered as 4×4 matrices whose entries are 8 bits. For example, the input state S with its 8-bit entries, i.e., s r,c , 0≤ r, c≤3, is represented as follows: Considering (1) as the input state of an encryption round. The transformations in each round, except the final round, are as follows: SubBytes: In each round the first transformation is the bytes substitution (SubBytes) which is implemented by 16 S-boxes. Let the 8-bit input and output of each Sbox bes r,c €GF(2 8 ) and s' r,c € GF(2 8 ) respectively. The Sbox consists of a multiplicative inversion, i.e., s -1 r, c € GF(2 8 ), followed by an affine transformation consisting of the matrix Г and the vector γ to generate the output as: (2) The 8-bit outputs of 16 S-boxes are used to obtain the output state of the SubBytes transformation as: Shift rows: In the second transformation, it cyclically shifts the 4 bytes of the rows of the input state to the left and the first row is left unchanged to obtain the output state, i.e., SR(S'), as: (4)

Mix columns:
In the third transformation, multiplying a constant matrix with the output state of ShiftRows, SR(S') in (4), to obtain the output state of MixColumns, i.e., the matrix S″, as: AddRoundKey: The final transformation is AddRoundKey in which the input state is added (modulo-2) with the key of the round. Considering the round key input state as the matrix K = [kr, c]3r,c = 0, with entries kr; c, 0≤r, c≤3, the output state of the AddRoundKey transformation, i.e., O, is obtained as: (8)

The systematic fault detection scheme for the multiplicative inversion of s-box and inverse s-box:
This scheme explains the 8-bit input of the multiplicative inversion is multiplied by the 8-bit output. Also the n-bit result (1≤n≤8) of the multiplication is compared with the actually obtained nbit result, i.e., 1 € GF (2 8 ). If s≠0 and 0 € GF (2 8 ). If s = 0 because the multiplicative inversion is also used in the inverse S-box, the same scheme can be used for the inverse S-box. We present a systematic method for the fault detection scheme for the multiplicative inversion by deriving the matrix-based formulations for the multiplicative inversion in the S-box and inverses-box. We use the following theorem from Mentens et al. (2005) to obtain the multiplication of field elements A= I˩αi Let s = s 7 α 7 +s 6 α 6 +s 5 α 5 +s 4 α 4 +s 3 α 3 +s 2 α 2 +s 1 α+s0 a n d s -1 = s 7 -1 α 7 +s 6 -1 α 6 +s 5 -1 α 5 +s 4 -1 α 4 +s 3 -1 α 3 +s 2 -1 α 2 +s 1 -1 α 1 +s 0 -1 be the 8-bit input and output of the multiplicative inversion in the binary field GF(2 8 ), respectively. Considering the fact that the result of the multiplication of the 8-bit input s, s ≠ 0 and the output s -1 of the multiplicative inversion is the unity polynomial 1 € GF(2 8 ), the following is derived from Theorem 1 for the relation between s and s -1 .
be the multiplication of A and B € GF(2 m ). Then, the coordinates of C can be obtained from: And the (m-1xm) binary matrix Q is obtained as follows: is obtained by logical OR operations of all inputs and outputs, u' = (s 0 ˅s 1 ˅s 2 ˅s 3 ˅s 4 ˅s 5 ˅s 6 ˅s 7 ) ˅( s 7 -1 ˅s 6 -1 ˅ s 5 -1 ˅ s 4 -1 ˅ s 3 -1 ˅ s 2 -1 ˅ s 1 -1 ˅ s 0 -1 ) Moreover, the modulo-2 additions (XOR operations) of the coordinates of s are shown with commas in indices, e.g., s 7,0 = s 7 + s 0 : Proof: We prove (13) for two cases of s = 0 and s ≠ 0 separately. Let the input (s ≠ 0) be a nonzero field element in GF(2 8 ) generated by P(x) = x 8 +x 4 +x 3 +x+1. Then, the multiplicative inversion should generate s-1. Using (12) in Theorem 1 and considering the irreducible polynomial of P(x), the (7×8) matrix Q can be obtained as: This matrix is obtained by using the representations of α 8 α 9. ..... α 14 with respect to the polynomial basis for different rows of Q. Considering A= s ≠ 0 and B = s -1 in Theorem 1, the matrices L and U in (10) and (11)  One can figure out that implementation (13) needs 64 ANDs, 15 ORs and 143 XOR gates. Also it is noted that XOR gates can be reduced to 84, if sub expression sharing is used. If one implements the S-box using the composite field presented in Breveglieri et al. (2007), it requires 36 and gates and 123 XOR gates for the original S-box implementation. Then, adding this fault detection scheme would require approximately 91% area overhead. Also the silicon area of an AND is 0.6 that of an XOR gate and is derived assuming that an XOR gate is implemented by 10 transistors.
Let us consider (18) (2 8 ) in the right hand side of (18), we have u' = 1, whereas the left-hand side is zero and therefore, the wrong output is detected.
Although checking the formulation of (18) detects all errors in the output of the S-box, its implementation is very costly (Proposition 1). To reduce the overhead of the fault detection scheme (Fig. 2), we have obtained the single-bit parity for the formulation of (18). In Fig. 2, this is obtained in order to compare only 1 bit for an 8-bit data to detect any combination of odd number of erroneous bits at the result of the left-hand side of (18). Thus, one can check the parity of two sides of (18) to obtain 1-bit equation for checking the S-box as follows: Theorem 2: Let s =s 7 α+s 6 α 6 +s 5 α 5 +s 4 α 4 +s 3 α 3 +s 2 α 2 + s 1 α+s 0 and s' = s' 7 α 7 +s' 6 α 6 +s' 5 α 5 +s' 4 α 4 +s' 3 α 3 +s' 2 α 2 + s' 1 α 1 +s' 0 be the 8-bit input and output of the S-box. The equation holds for all the possible patterns of s and s' is as follows: P (M s'+m) = s 0 (s' b +s' c )+s 1 s'b+s 2 s'd+s 3 s'4+s 4 (s'c +s' 3 )+s 5 s' a +s 6 (s' d +s ͞ ͞ ' 6 )+s 7 (s' 5 +s' 4 ) = u' where s' a = s' 0 +s' 2 +s' 3 +s' 5 ., s' b = s' a +s' 7 , s' c = s'+s' 4 +s' 6 and s' d = s' 2 +s' 7 (21) Proof: The parity of two sides of (18) as obtained and we have: where, M, m and u' are presented in Theorem 2. Considering the fact that parity is a linear operation, P(M s'+m) = PM s'+P m. Then, using M and m defined in Theorem 2 one can obtain: P M s' = s a s' 0 +s b s' 1 +s c s' 2 +s' 3 (s a +s 4 ) +s' 4 (s b +s 3 +s 7 )+ s' 5 (s a + s 7 )+s' 6 (s b + s 6 )+s' 7 ( s 5 +s c ) And P m = s 6 +s 7 , where s a = s 0 +s 1 +s 5 , s b = s 0 +s 4 , s c = s a + s 2 + s 6 , after rearranging, the proof is complete.
Corollary 2: For the fault detection of the inverse Sbox, one can use by changing the place of the input and output, i.e., swapping the coordinates of s with s'.

SIMULATION RESULTS
Here we have considered both the single and multiple stuck-at errors for the proposed scheme. And these models covers both natural faults and fault attacks. In the AES encryption or decryption rounds, if exactly 1 bit error appears at the output, this proposed scheme detects it, the error coverage is about 100%. Because in this case, one of the 8-bit four error indication flags in alarms the error. However, multiple stuck-at errors are also considered. Because multiple bits will actually be flipped due to the reason an attacker cannot be able to flip exactly 1 bit in a single stuck at error to gain more information by some technical constraints.
The AES algorithm and low complexity fault detection scheme for composite s-box was described in VHDL and we used the Modelsim 6.3 g_b1 tool to simulate the code. We analyzed the area and internal and external fault in AES. The fault detection schemes of sub byte in existing and proposed compare the performance in Table 1. Figure 3 to 6 shows the simulation results.
The implementation of s-box requires large number of gates in traditional (LUT) Look up table. Also the unbreakable delay is longer than that of the total delay. Also it is not suitable for resource constrained use  because it costs a large area. Thus the composite field arithmetic is used to solve these problems. The fault detection scheme implemented by (LUT) look up table in VHDL code synthesised using Xilinix 9.1ISE and get the report of gate count. The gate count value of LUT based fault detection scheme is 262144 logic gates. This fault detection scheme requires 4 times greater than the proposed one. We have to compare it with proposed once and our aim is to reduce the gate count to maximum possible extent. Composite field implementation of s-box needs less number of gates. We can describe in VHDL and perform synthesis using Xilinix 9.1 In our synthesis report we got a comparatively small value of value of 28514 numbers of gates. Our aim is to reduce the gate count to maximum possible extent. We got gate count almost one tenth of look up table implementation of s box.

Encryption output for composite field s-box without error:
Plain text: x"00112233445566778899aabbccd deeff" Key: x"000102030405060708090a0b0c0d0e0f" Cipher text: x"69c4e0d86a7b0430d8cdb78070b 4c55" Figure 3 shows the output for composite field sbox without error any for a particular input. Theta, eta, gamma, sigma values are obtained and the fault output value is zero.
Encryption output for composite field s-box without error: Figure 4 shows the Output for encryption for composite field s-box without error for a particular 128bits input and128-bit input key. Fault detection scheme implemented by composite field s-box and detect a internal fault. We will consider the initial round pre_out output value and replace output of pre_out with another 128-bit value and stimulate with modelsim we will get faulty output Theta, eta, gamma, sigma values are obtained and the fault output value is obtained. In this waveform fault occur in add roundkey transformation in initial round.
Decryption output for composite field s-box without error (Fig. 5): Decryption output for composite field s-box with error: Figure 6 shows the Output of decryption for composite field s-box with error any for a particular 128-bits cipher input and 128-bit input cipher key of each round. Fault detection scheme implemented by composite field s-box and detect a internal fault. We will consider the internal inverse round key output value and replace output of inverse add round key with another 128-bit value and stimulate with modelsim we will get faulty output Theta, eta, gamma, sigma values are obtained and the fault output value is obtained. In this wave form fault occur in add round key transformation of initial round.

CONCLUSION
We have presented a high performance low complexity parity based fault detection scheme for the AES. These schemes are constructed using the S-box and the inverse S-box using composite fields. We have obtained the least complexity S-boxes and inverse Sboxes including their fault detection circuits. The new fault detection schemes are independent of the structures of the S-boxes and the inverse S-boxes. So that we have used parity based method in s box. Therefore it improves the fault coverage to a greater extent because here the error detection at s box takes two times of it. This simulation results shows that the proposed structure-independent schemes have the highest efficiencies with acceptable error coverage. It shows reasonable area also the time complexity overheads.