Flexible and high ‐ throughput structures of Camellia block cipher for security of the Internet of Things

The advancements in wireless communication have created exponential growth in the Internet of Things (IoT) systems. Security and privacy of the IoT systems are critical challenges in many data ‐ sensitive applications. Herein, high ‐ throughput and flexible hardware implementations of the Camellia block cipher for IoT applications are presented. In the proposed structures, sub ‐ blocks of the ciphers are implemented based on optimised circuits. The proposed structures for Camellia are designed and shared for implementing the encryption process and generating some intermediate key values in the two separate times. The most complex block in these ciphers is the substitution box (S ‐ box). The S ‐ boxes are implemented based on area ‐ optimised logic circuits. The Camellia S ‐ boxes consist of a field inversion over F 2 8 and two affine transformations over F 2 . The inversion operation is implemented over the composite field F ð 2 4 Þ 2 instead of an inversion over F 2 8 which is an important factor to reduce area consumption. A large number of gates, in the structure, have been implemented by 2 ‐ input NAND and 2 ‐ input NOR gates to reduce delay and area. Also, the flexible structure for Camellia that can do various configurations of this cipher to support variable key sizes 128, 192 and 256 bits was proposed. Implementation results of the proposed architectures in 180 nm CMOS technology for different key sizes are achieved. The results show improvements in terms of execution time, throughput and throughput/area compared to the other related works.


| INTRODUCTION
The increasing number of services provided by the Internet has generated a huge increase in the number of connected devices. There are many Internet of Things (IoT) applications such as e-health, e-commerce, smart home, smart city, smart hospital, etc. [1]. The IoT is a network used to interconnect embedded devices, such as sensors, which can generate, communicate and share data [2]. In the IoT systems, we have many new security and privacy challenges [1,3]. With the exponential growth in IoT systems, security and privacy issues have emerged as critical challenges in many sensitive applications such as e-health and e-commerce. For example, the IoT in the e-health improve the quality of healthcare services as well as reduce the healthcare cost. The healthcare field is remodelling by the IoT to improve the social benefits by offering continuous monitoring of patients and services as well as updating healthcare medical records [4]. The healthcare data provide real sensitive information that should be protected [5]. Therefore, there is an urgent need to address these challenges. In recent years, many cryptographic algorithms have been used to provide the security of transmitted data in the IoT [6]. Cryptographic algorithms proposed for IoT applications can be categorised into the symmetric and asymmetric (public key) algorithms. Public key algorithms are time-consuming. Therefore, most of the applications in IoT networks use symmetric algorithms to provide security of the transmitted information [7,8]. Lightweight block ciphers as one of the most important symmetric algorithms are used to providing the security of IoT communication. These ciphers are easy to implement, use fewer computational resources with low overhead.
Many lightweight block ciphers have been proposed to reduce the costs of hardware consumption [9]. Block ciphers are used for data protection in the cryptosystems as a good candidate for resource-constrained cryptographic applications. These cryptographic primitives have been the important area of cryptographic researches [9]. Camellia [10] is suitable block cipher for hardware implementations in the embedded cryptographic systems. Camellia [10,11] cipher has 128 bits data block size and 128, 192 and 256 bits key lengths. This block cipher was jointly developed by Nippon Telegraph and Telephone Corporation (NTT) and Mitsubishi Electric Corporation (Mitsubishi) to provide flexibility and security in cryptographic applications [10]. The Camellia block cipher is standardized in the International Organization for Standardisation (ISO) and the International Electrotechnical Commission (IEC) (ISO/IEC 18,033) [12]. It has been recognized by CRYPTREC (Cryptography Research and Evaluation Committee) [13] and NESSIE (New European Schemes for Signatures, Integrity, and Encryption) [14] projects as a cryptographic algorithm for use on the Internet. The security and robustness of Camellia are investigated in [10,15,16] by testing resistance against different attacks such as Linear attack, Differential attack, Boomerang attack, Higher-order differential attack, Interpolation attack, Impossible differential attack, Improved zero-correlation attack, Square/Collision attack, and Cache trace attack.
Different ASIC implementations of the Camellia algorithm have been reported in [10,[13][14][15][16][17][18][19][20][21][22][23][24]. In [18] compact and highspeed hardware2 architectures for the 128-bit AES and Camellia block ciphers are presented. The composite field arithmetic is used to reduce the size of the S-Boxes in this work. In [19] error detection approaches for the Camellia block cipher taking into account its linear and non-linear sub-blocks are proposed. In [20] a loop architecture based on one round function block is presented. The S-boxes are implemented based on both lookup table (LUT) and composite field arithmetic.
The flexibility, throughput, execution time and area consumption are the main design dimensions (challenges) of secure digital systems. A good design should consider the trade-offs between these dimensions. The flexibility is the ability to the reconfiguration of the system parameters such as key size. It is even more important especially for applications with a diverse set of requirements [28]. IoT applications are an outstanding example of this scenario. The level of security required by IoT devices and their supporting solutions will vary depending on the specific functions they are performing. A common practice of flexibility is to implement adaptive security for IoT applications such as smart control system based on a collection of the sensors for example, temperature sensor, humidity sensor, light sensor, presser sensor, heart sensor, flow metre sensor, etc. In the flexible structure, the key size varies in an acceptable range depending on the trust level of the sensors. An encryption process with a lower level of security (shorter key) allows high-speed computations in an application with less important information. On the other hand, the sensitive data within the network such as healthcare IoT applications is encrypted with a higher security level (longer key). The IoT systems are constrained in terms of execution time and computational resources; the problem is compounded when dealing with multimedia contents. Therefore, a flexible cryptographic structure can be reconfiguration based on the requirement with the best performance. The previous works have focussed less on improving the design parameters, especially flexibility. Herein, we proposed high throughput and flexible hardware structures for Camellia block cipher for secure hardware IoT applications. Due to flexibility, they can be used for different types of IoT applications with multiple levels of security strengths. A unified circuit with resources sharing is implemented for Camellia cipher to compute the encryption process and some Herein, Section 2 recalls the Camellia block cipher and the proposed hardware structures are described. Section 4 shows a comparison between our works and related works. Finally, Section 5 gives conclusions.

| DESCRIPTION OF THE CAMELLIA BLOCK CIPHER AND PROPOSED HARDWARE STRUCTURE
The Camellia cryptographic algorithm is a block cipher that can process 128 bits data blocks, using 128, 192 and 256 bits secret keys. In this case, we have a flexible key size. It has better efficiency in both software and hardware implementations when compared to other ciphers [10]. The Camellia cipher uses an 18-round Feistel architecture for 128-bit keys and a 24-round Feistel architecture for 192-and 256-bit keys. Two main parts of the Camellia are the data processing part and key scheduling part. Algorithms 1 and 2 show the encryption procedure of Camellia for 128-bit and 192/256-bit keys, respectively. In the Camellia cipher, FL and FL −1 functions are inserted every six rounds in the algorithms.

Algorithm 1 Camellia Encryption Algorithm for 128bit keys
Input: 128-bit plaintext P = P L ‖P R , whitening keys kw 1 to kw 4 , round keys k 1 to k 18 and subkeys kl 1 to kl 4 .

Algorithm 2 Camellia Encryption Algorithm for 192/ 256-bit keys
Input: 128-bit plaintext P = P L ‖P R , whitening keys kw 1 to kw 4 , round keys k 1 to k 24 and subkeys kl 1 to kl 6 .

| F function
The F-function is the most important block in the Camellia cipher computations. The structure and computations of Ffunction are shown in Figure 1. The F-function uses a Substitution-Permutation Network (SPN) structure. As seen from this figure, the function is constructed based on an 8-bit Sboxes S 1 , S 2 , S 3 and S 4 (S-function layer) and sixteenth 8-bit bit-wise XOR operations (P-function layer). The S-function part is a non-linear layer and the P-function part is a linear layer.

| S-boxes S 1 , S 2 , S 3 and S 4
The S-function layer consists of eight S-boxes, and four different S-boxes, S 1 , S 2 , S 3 and S 4 are used. All of them are equal to the field inversion operation over finite field F 2 8 . The data of S 2 , S 3 and S 4 can be generated from the S-box S 1 . The 8-bit S-box S 1 , for input x and output y 1 , is defined as follows: Also, for the S-boxes S 2 , S 3 and S 4 we have: Where x ⋘ j denotes left cyclically shift of x by j-bit and ⊕ is bit-wise XOR. The f and h functions can be presented based on the following computations: The hardware resources for implement two functions f and h are equal to 18 2-input XOR gates. The main block in the Sbox S 1 is a field inversion over F 2 8 (g function) [10]. Figure 2 shows the proposed structure of S-box S 1 . The inverse function is performed in F 2 8 defined by a primitive polynomial f 1 (z) = z 8 + z 6 + z 5 + z 3 + 1. Implementation of inversion operation over the composite field instead of an inversion over F 2 8 is an important factor to reduce area consumption [29]. Here, for the proposed implementation we use the composite field F ð2 4 Þ 2 defined by the following irreducible polynomials: where ω is a root of f 2 (z). In the following, we present the proposed hardware structure of the field inversion over the composite field F ð2 4 Þ 2 . Let n 1 β + n 0 is an arbitrary element in F ð2 4 Þ 2 , where n 1 ; n 0 ∈ F 2 4 , the inversion  2 1 and vector representation of λ is equal to (1, 0, 0, 1) [10]. Figure 3 (a) shows the structure of field inversion operation over composite field F ð2 4 Þ 2 . This operation is constructed based on four sub-blocks consisting of multiplication by constant λ, four field multiplications, one field inversion, one field squaring, and two field additions. The sub-blocks multiplication, inversion, and squaring are defined over F 2 4 with primitive polynomial f 2 (z) = z 4 + z + 1. The multiplication and inversion, squaring operations over F 2 4 are implemented based on [30,31]. But, in the proposed implementation we applied further optimization in the structures of multiplication and inversion over F 2 4 to reduce critical path delay and area consumption. The squaring of element K ¼ ðk 3 as l 3 = y 0 , l 2 = y 3 , l 1 = y 2 , l 0 = y 1 ⊕ y 0 . The proposed merged formula of squaring and multiplication with constant λ, H = (A 2 ) � λ we have: The low-cost structure of merged squaring and multiplication with constant λ in the proposed S-box is shown in Figure 3 (b). In this case, the computations of squaring and multiplication by constant λ are implemented by only two XOR logic gates with CPD equals T X , where T X is gate delay of the 2-input XOR gate. Figure 3 (c) shows the proposed efficient hardware structure of field inversion over F 2 4 . After further simplifications on i 0 , i 1 , i 2 , and i 3 terms, the hardware consumption, and critical path delay are equal to 8 2-input XOR gates, 3 2-input XNOR gates, 11 2-input NAND gates, 1 2-input NOR gate and three NOT gates and 2T X + T XN + T NA , respectively. A large number of gates, in the structure, have been implemented by 2input NAND and 2-input NOR gates to reduce delay and area. Table 1 shows the hardware results of the proposed structure of 8-bit S-box S 1 and other related works. As seen

| FL and FL −1 functions
In the Camellia cipher, two functions FL and FL −1 are used in the data processing part. These logical functions are inserted every six rounds. Let kl i = kl iL ‖kl iR be an intermediate key and also X = X L ‖X R and Y = Y L ‖Y R be 128-bit input and output, respectively. Therefore, two functions FL and FL −1 are defined as follows: where, ∧ and ∨ are bit-wise AND and OR operations, respectively. Figure 2 (b) and (c) shows the structures of two functions FL and FL −1 , respectively.

| The proposed hardware structures for 128-bit, 192-bit and 256-bit keys
In this section, we present the hardware structures for implementation of the Camellia block cipher. Figure 4 shows the proposed block diagram of Camellia block cipher. The structure is designed based on the merged hardware for computing the data processing and K A and K B variables, where the K A and (these variables are discussed in the next subsection). The proposed structure of the Camellia block cipher is shown in Figure 5 for 192/256-bit keys. The three 2-to-1 multiplexers with control signal Sel 3 are used for configuration of two modes of data processing and generating the K A and K B variables. In this case, if the control signal Sel 3 is equal to '1' the structures are configured for computing the K A and K B variables and for case Sel 3 = '0' we have the configuration of the structures for data processing part computations. Two control signals Sel 1 and Sel 2 are used for control of the encryption process. In the structure, the control signal Sel 2 is set to '1' (in the first clock cycles) and values kw 1 ⊕ P L , kw 2 ⊕ P R , subkeys k 1 to k n and kl 1 , kl 2 are applied to the structure. In other clock cycles, the control signal Sel 2 is set to '0' and other subkeys are used for computation. This structure computes n rounds of computations at each clock cycle, where 1 ≤ n ≤ 6. Therefore, generating the ciphertext C, after computing the K A and K B , required to r/n+1 clock cycles for both 128-bit and 192/256bit keys.

| Key schedules of Camellia block cipher and the proposed hardware structures
The key schedule part generates 64-bit whitening keys kw t (t = 1, 2, 3, 4), round keys k u (u = 1, 2, …, r) for round functions and kl i (i = 1, 2, …, r/3 − 2) for FL and FL 1 functions from the main key K, where r is the number of rounds. In the key schedule part of Camellia block cipher, two 128-bit variables K L = K LL ‖K LR and K R = K RL ‖K RR are introduced, which are defined as follows: For 128-bit main key K → K L = K, K R = 0. For 192-bit main key For 256-bit main key K → K L ‖K R = K. Two 128-bit variables K A and K B are generated based on K L and K R , where for the 128-bit key we use only K A and for 192/256-bit keys, both K A and K B are used for key schedule part. The structure for generating the K A and K B is presented in work [11] (Figure 8). In this figure the 64-bit constants ∑ i (i = 1, 2, …, 6) are used as keys in the structure (Feistel network). These constants are presented in [10]. The K A and (K A and K B ) variables, for 128-bit and 192/256-bit keys, are computed in five and seven clock cycles, respectively. The control signals L 1 , L 2 and L 3 are used for control of the 64-bit constants ∑ i , 1 ≤ i ≤ 6 and intermediate results. In the first round block, multiplexers with control signal L 1 , L 2 , L 3 and 128-bit XOR operation are used for implementation of K A and (K A and K B ) computations. In the proposed structures, we first compute the variables K A and K B then the key scheduling and the data processing are started. Control signals of the proposed structure for computing the K A , K B variables in the case (192/ 256-bit keys and n = 2, where n is the number of computed rounds at each clock cycle and 1 ≤ n ≤ 6) are presented in Table 2.
The 64-bit subkeys kw t (1 ≤ t < 4), k u (1 ≤ u < r), and kl i (0 ≤ i < r/3 − 2) are generated by rotating K L , K R , K A , and K B and taking the left-half or right-half of them. The computations of 64-bit subkeys for 128-bit and 192/256-bit keys are TA B L E 1 Hardware results of the proposed structure of 8-bit S-box S 1 and other related works

Works # AND (or OR) # NAND (or NOR) # XOR (or XNOR) # NOT CPD
Ref. [ Table 3. The proposed structures (for case n = 2) to generation of round keys for case 192/256-bit main key is shown in Figure 6. The number of clock cycles for this case (n = 2) are equal to 9 and 12 for 128-bit and 192/256-bit main keys, respectively. The control signals Se 1 , Se 2 , d[1 : 0] and c[1 : 0] are used for control and application of the generated subkeys into the data processing part. The subkeys are stored into registers Reg 1 and Reg 2 based on control signal Se 1 , Se 2 . The register Reg 1 is used for storing of the generated subkeys k 1 , k 2 , …, k r based on rotate blocks. Also, the subkeys kl 1 , kl 2 , …, kw 4 are stored into the register Reg 2 . The proposed structures generate the subkeys (k 1 , k 2 ), (k 3 , k 4 ), (k 5 , k 6 , kl 1 , kl 2 ), …, (k 23 , k 24 , kw 3 , kw 4 ) at the first, second, …, and last clock cycles, respectively. The rotate blocks, in the structures, are implemented by wired cyclic shift without extra hardware. Table 4 shows the control signals of the proposed structure, n = 2, for case 192/256-bit key at the begin of encryption process (Figure 6 as key scheduling part and Figure 5, for n = 2, as data processing part). For example, in the first clock cycle, we have computation of the first two rounds of data processing. In this case, for computation of these operations, the control signals Se 1 Se 2 , c[1 : 0], d[1 : 0], Sel 1 and Sel 2 are equal to ″01″, 0, 0, '0' and '1', respectively.
Also, the computations in the next clock cycles are implemented by the proposed structure based on Table 4. Therefore, the ciphertext is generated with latency 13 clock cycles.

| The proposed flexible structures of Camellia block cipher
In this section, we present a flexible structure for implementation of the Camellia block cipher. As mentioned before, the Camellia block cipher has three key sizes consisting of 128-bit, 192-bit, and 256-bit. The flexible architecture is used to implementing the Camellia cipher with different security levels. The proposed structure supports the key sizes 128, 192 and 256. Figure 7 shows the proposed flexible structure for implementation of the Camellia cipher for the case n = 2. The width of the registers and multiplexers in the encryption part and the round keys generator part are equal to 64-bit and 128-bit, respectively. The structure is configured based on the control signals Se 1 , Se 2 , d[1 : 0], c [1 : 0], L 1 to L 3 , Sel 1 to Sel 3 and Sel 128_192/256 . In the first step of computations, the control signal Sel 2 is equal to '0' and we have computing the K A and K B parameters based on data processing part (K A and (K A and K B ) parameters are generated in five and seven clock cycles for

| SECURITY ANALYSIS OF THE PROPOSED STRUCTURES
The main focus of this work is the design and implementation of flexible and high-throughput hardware structures of the Camellia block cipher. However, we analyse the security of structures from a hardware point of view. Side-channel attacks The proposed structures for generating the round keys (n = 2) for case 192/256-bit keys RASHIDI -179 are the most threats to the security of cryptosystems. The sensitive data such as the main key can be recovered by these attacks. Side-channel attacks are non-invasive passive attacks that use certain physical information leaked during encryption operation. In this case, physical information such as power consumption [37], time delay [38], or electromagnetic radiation [39] are used to find the secret main key. The power analysis attack is a side-channel attack that interprets power consumption measurements during cryptographic operations. It can achieve information about a device's operation as well as the secret key based on a power trace.
In the proposed architectures, at each clock cycle, we have the computation of operations with the same hardware complexity. Therefore, the power consumption at each clock cycle is almost constant. This feature leads to a unified power trace in total clock cycles and the power consumption traces are independent of the secret key patterns. The timing attack is another important side-channel attack. The algorithm execution time in this attack is measured precisely [38]. If the algorithm computation time for different keys and plaintexts is different, this will lead to obtaining information about the bitpattern of key by an attacker. Therefore, the hardware implementation of the algorithm should reduce dependence on the timing information. Each encryption operation in the proposed structures has a fixed computation time. The execution time is independent of the bit-pattern of key and the structures leak no information about this parameter. For example, the timing results for the proposed 128-bit and flexible structures of Camellia cipher are shown in Table 5. In this case, the encryption time for 50 separate plaintexts (P 1 , P 2 , …, P 50 ) and main keys (K 1 , K 2 , …, K 50 ) are measured. The encryption of the plaintexts (P 1 , P 2 , …, P 50 ) takes the same time T 1 and T 2 for the 128-bit and flexible structures, respectively. Table 5 shows some samples. The execution time for each of the 50 measurements for the proposed 128-bit and flexible (for 128bit key) structures is equal to 95.10 and 100.65 ns, respectively on 180 nm CMOS technology. As seen from this table, the execution time of the structures does not change when the plaintext or key being manipulated. Therefore, the internal computations of the block cipher algorithm are hidden.

| RESULTS AND COMPARISON
The One GE is equivalent to the area of a 2-input NAND gate with the lowest driving strength of the corresponding technology. In other words, this metric represents the amount of consumed area normalized to the area of one 2-input NAND gate. The performance and results of the designs are evaluated in terms of critical path delay, number of clock cycles, area, computation time, throughput, throughput/area and power consumption. The power consumption for a frequency of 100 KHz is measured. The results of the proposed works are achieved based on different key sizes (128, 192 and 256 bits) and two cases n = 1 and n = 2. Results of the proposed implementations and related works for Camellia cipher are shown in Table 6. The optimization in works [18,20] is based on area and speed. In [18] compact and high-speed hardware architectures for the 128-bit block cipher Camellia are presented. The composite field arithmetic is used to reduce the area consumption of the S-Boxes in this work. In [20] a loop architecture based on one round function block is presented. The S-boxes in [20] are implemented by two methods LUT and composite field arithmetic. As given in the table, the proposed structures consume acceptable hardware resources with reasonable timing characteristics compared to the other architectures. We also get improvements in terms of area, throughput, and throughput/area. When n ≥ 3, the critical path delay, hardware resources and execution time of the circuit become longer than those of the circuit for cases n = 1, 2. Also, the throughput and throughput/area are reduced for cases n ≥ 3.
Results of the proposed implementation for flexible Camellia cipher are shown in Table 7. This structure supports three main key sizes of 128, 192 and 256-bit, and it is based on the 2-round Feistel network. The architecture provides a versatile implementation that enables adaptive security level using a variable key size. The flexible structure is a suitable candidate for a broad range of applications with multiple levels of security. As given in Table 7, the proposed structure consumes an acceptable area with low critical path delays. Also, Table 8 shows the results of proposed 128-bit key structure of Camellia cipher and other lightweight ciphers with 128-bit key. It should be noted that the HIGHT, PRESENT, and LED block cipher have the 64-bit block size (plaintext) with the 128-bit key size.
The main differences between the proposed Camellia structures and other related works are as follows: