Low Area PRESENT Cryptography in FPGA Using TRNG-PRNG Key Generation

LightweightCryptography (LWC) iswidely used to provide integrity, secrecy and authentication for the sensitive applications. However, the LWC is vulnerable to various constraints such as high-power consumption, time consumption, and hardware utilization and susceptible to the malicious attackers. In order to overcome this, a lightweight block cipher namely PRESENT architecture is proposed to provide the security against malicious attacks. The True Random Number Generator-Pseudo Random Number Generator (TRNG-PRNG) based key generation is proposed to generate the unpredictable keys, being highly difficult to predict by the hackers. Moreover, the hardware utilization of PRESENT architecture is optimized using the Dual port Read Only Memory (DROM). The proposed PRESENT-TRNGPRNG architecture supports the 64-bit input with 80-bit of key value. The performance of the PRESENT-TRNG-PRNG architecture is evaluated by means of number of slice registers, flip flops, number of slices Look Up Table (LUT), number of logical elements, slices, bonded input/output block (IOB), frequency, power and delay. The input retrieval performances analyzed in this PRESENT-TRNG-PRNG architecture are Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM) and Mean-Square Error (MSE). The PRESENT-TRNG-PRNG architecture is compared with three different existing PRESENT architectures such as PRESENT On-TheFly (PERSENT-OTF), PRESENT Self-Test Structure (PRESENT-STS) and PRESENT-Round Keys (PRESENT-RK). The operating frequency of the This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1448 CMC, 2021, vol.68, no.2 PRESENT-TRNG-PRNG is 612.208 MHz for Virtex 5, which is high as compared to the PRESENT-RK.


Introduction
Lightweight Cryptography (LWC) plays a vital role to obtain the higher security with low energy and low area in different sensitive applications such as implantable and wearable medical devices, radio-frequency identification tags, Wireless Nano sensors, and smart cards and secure embedded systems [1][2][3]. The symmetric cryptography is divided into two types such as block and stream ciphers. The block cipher processes the one input block at a time and produces the output block for each input block, but the stream cipher frequently processes the input elements and generates the one output element at a time [4,5]. FPGA is considered as a growing design platform to implement the cryptographic algorithms because of its in-house security and reconfigurability [6].
Generally, the Advanced Encryption Standard (AES) is widespread block cipher and fundamental for many security systems [7]. But AES used in the high-performance processors is not suitable in resource-constrained platforms because of its inadequate area and energy/power [8]. Therefore, the better tradeoff between the security, power, area and speed is obtained by designing the lightweight block cipher [9]. Some of the examples of the lightweight ciphers are the STES [10], SEED [11], ANU [12], KLEIN [13], PRESENT [14], and KASUMI [15] and so on. From the different lightweight cipher, PRESENT block cipher is selected as an efficient algorithm due to its hardware efficiency and it also standardized by ISO/IEC 29192-2. However, the hardware failures are considered as the natural fault in the implementation of Very-Large-Scale Integration (VLSI). This natural fault increases the sensitivity and creates the malicious attacks over the cryptographic hardware and embedded systems [16]. The conventional LWC method uses the same type of generators and identical keys to accomplish both the encryption and decryption process that leads susceptible to the attacks [17].
The major contributions of this research paper are given as follows: • The PRESENT architecture improves the robustness against the malicious attackers by random key generation using TRNG-PRNG module and two stage security is used during the encryption process. • In PRESENT architecture, the key value from the TRNG-PRNG module is generated for each clock and plaintext. Therefore, the identification of key value by the unauthenticated users (i.e., malicious attackers) is difficult during the encryption/decryption process. • A DROM is utilized to minimize the number of logical elements used in the PRESENT architecture. The DROM used in the PRESENT architecture accomplishes the operation of Substitution box (S-box).
The overall organization of the paper is: The literature survey related to existing PRESENT architecture is described in Section 2. The problem statement found from the literature survey along with solution is described in Section 3. Section 4 describes the PRESENT architecture by using key scheduling approach and TRNG-PRNG module. The results and discussion of the PRESENT-TRNG-PRNG architecture is presented in Section 5. Finally, the conclusion is made in Section 6.

Literature Survey
The literature survey regarding the recent PRESENT block cipher is described along its advantages and limitations in this section.
Pandey et al. [18] implemented the PRESENT lightweight block cipher algorithm to accomplish the encryption and decryption processes. The developed PRESENT architecture processed 64-bit input value along with the 80/128 bit of key length. Additionally, the dynamic keys were provided with the OTF architecture to compute the intermediate key. Next, the generated intermediate keys were used to accomplish the encryption/decryption operation. The iterative method considered in the decryption is used to achieve the better tradeoff among time and area. The total power consumption of the PRESENT architecture was high at low frequency, when processed with high number of key bits (128 bit).
De Cnudde et al. [19] developed the evaluation of PRESENT block cipher under two different physical attacks such as Side-Channel Analysis (SCA) and fault attacks (FAs). The first order implementation was used to provide the security against the side-channel. Next, the Private Circuits II is used to provide the security against the FA. The leakage detection test was used to analyze the side channel evaluation. But, the Private Circuits II used for FA resistance in the PRESENT block cipher was expensive.
Azari et al. [20] implemented the PRESENT Cipher model that incorporated both encryption and decryption process. The encryption and decryption process were accomplished by using 80/128-bit key to obtain the security for 64-bit input value. The plain text of 64 bits processing requires 16 cycles to load the data during encryption process. Here, the PRESENT cipher obtains a higher throughput based on the effective encryption and decryption process. However, the PRESENT cipher used high number of S-boxes which increased the hardware utilization.
Rashidi [21] presented the two different low-cost and high-throughput block ciphers such as HIGHT and PRESENT to improve the security. Since, the modulo 28 was one of the complex blocks in the HIGHT algorithm. Next, the parallel prefix adders such as Sklansky, Han-Carlson, Kogge-Stone and Ladner-Fischer were used to design the modular adder. Moreover, the PRESENT cipher was supported by two key lengths such as 80-bit and 128-bit. The Karnaugh mapping was used to reduce the amount of logic gates in the S-box and critical path delay. But, the computation time was high and throughput was less when the unroll factor is high in block ciphers.
Lara-Nino et al. [22] developed the standardized lightweight cipher namely PRESENT to overcome the security issues caused at the extremely constrained environments. Moreover, the data in the registers were moved to the right that used to reduce the MUX size. The PRESENT architecture used two different alternatives to generate the RK of 80 bit and 128 bit. Since, the 80 bit and 128 bit input keys were generated using 20 bit registers and 32 bit registers respectively. The key given to the PRESENT architecture was manually generated by the key generator module and it can be easily predicted by hackers.

Problem Statement
The problems obtained from the existing literature survey along with the solution by the PRESENT-TRNG-PRNG architecture is as follows.
The STS based PRESENT architecture requires an additional comparator to generate the output [19]. Next, the conventional PRESENT architecture uses high amount of S-box operation to accomplish the encryption process [20]. The unroll factor considered in the loop unrolling method also affects the performance of the hardware utilization [21]. The aforementioned constraints increase the number of logical elements used in the PRESENT architecture. Since, the increment of hardware utilization leads to affect the operating frequency and power of the overall PRESENT architecture. The manual key generation accomplished in the PRESENT architecture [22] generates the same key value for each clock cycle. The generation of same clock cycle for each round can be easily predicted by the malicious attackers.

Solution:
The logical components of PRESENT architecture are minimized by using the DROM. In this PRESENT architecture, 8 DROM is used instead of 16 S-boxes of the conventional PRESENT architecture. The DROM used in the PRESENT-TRNG-PRNG accomplishes the same process which is performed by the S-box. Moreover, the security of the PRESENT-TRNG-PRNG architecture is improved by using two different approaches: (1) two stage security approach and (2) unpredictable key generation using TRNG-PRNG module. The robustness of the PRESENT architecture is improved by generating the random key for each clock cycle and each plaintext.

PRESENT-TRNG-PRNG Architecture
In the PRESENT-TRNG-PRNG architecture, the logical elements are optimized by using the DROM to accomplish the encryption/decryption process. The PRESENT architecture is designed to support the 64-bit input value with 80-bit key length. Here, the random key generation is carried out by using the TRNG-PRNG module. The randomness of the key from the TRNG-PRNG module is improved using the two-stage security enabled during encryption process. The block diagram of the PRESENT-TRNG-PRNG architecture is shown in Fig. 1. The overall working process of the PRESENT-TRNG-PRNG architecture are given as follows: 1. At first, the input image (P) is read from the MATLAB R2018a software and the image pixels are converted into binary format. 2. Next, the binary value of the image pixels is written in the text format using MATLAB. 3. The TRNG-PRNG module is used to generate the random key value to accomplish the encryption operation. The decryption process is generally the inverse process of the encryption operation. 4. The Verilog (Modelsim) is used to process both the encryption/decryption process. Moreover, the output of encryption and decryption is written in the text format using the Verilog (Modelsim). 5. Then the text files are used in the MATLAB to convert the encrypted and decrypted binary value into the image.

64-Bit Path Encryption
The overall architecture of the path encryption for 64-bit data is shown in Fig. 2. At first, the one pixel from the image is converted into 8 bits and total plain text of 64-bit data (PT) is kept in the register. The plaintext stored in the register is denoted as Dreg. On the other hand, the TRNG-PRNG module generates an appropriate key to accomplish the encryption operation over the 64-bit value of plaintext. The conventional PRESENT architecture manually generates the key values which are subjected to predict by the hackers. The main objective of using TRNG-PRNG module in PRESENT architecture is to obtain high security level by generating the random key value for each pixel at every clock cycle. The 80 bit of key value (kdat1) generated from the TRNG-PRNG module is stored in the register kreg. From the 80 bit of kdat1, the MSB of 64 bit data (kdat2) is selected and then it is XORed with the 64 bit of PT value as shown in Eq. (1).
where, the dat1 represents the XOR value between the plaintext and MSB of 64-bit data from the key value generated by TRNG-PRNG module.
Next, the XORed data is truncated into 16 four bit values which are shown in Eq. (2).
From the 16 sets of 4-bit values, each 2 sets of 4-bit values are given into the DROM which processes the operation of Substitution box (S-box). For example, the dat1 [0 : 3] and dat1 [4 : 7] are given to the DROM 1 to process the S-box operation. Totally, there are eight DROMs are used to produce the 64-bit value based on the S-box operation shown in the Tab. 1. The conventional PRESENT architecture uses the 16 different S-box operation that leads to increase the hardware utilization and increases the delay while processing the input plain text. Hence, the PRESENT-TRNG-PRNG architecture uses only 8 DROMs to process the S-box operation which minimizes the number of logical elements used in the encryption process. The reduction in logical elements minimizes the hardware utilization and increases the speed of the encryption process.
Then the concatenated 64-bit value is processed through the permutation layer (P-layer). This P-layer used to move the bit value in new bit position as shown in the Tab. 2. Moreover, the value from the P-layer is represented as dat3 and this updated dat3 is considered instead of plaintext for next 31 rounds.
This updated kdat4 is given to the key scheduling process to accomplish the second stage security. Both the first and second stage security are used to improve the randomness of the key values.

Key Scheduling Process
The architecture of key scheduling used in the 64-bit path encryption is shown in Fig. 3. This key scheduling is processed for the next 31 rounds to improve the security of the plaintext against malicious attackers. After completing the 32 rounds, the PRESENT architecture provides the encrypted cipher text that is denoted as CT. Moreover, the decrypted value is obtained based on the inverse process of PRESENT decryption. Here the reverse architecture of PRSENT module is used during the decryption process. The process of key generation using the TRNG-PRNG module is explained in the following section.

Key Generation Using TRNG-PRNG Module
In this TRNG-PRNG module, the key value is generated for each clock cycle as well as for each plain text to improve the security. The overall architecture of 80-bit key generation using TRNG-PRNG module is illustrated in Fig. 4. Generally, the TRNG is designed by the digital circuits to produce the true randomness using the unpredictable effects. Here the TRNG is generated by using the $random function. The sequence generated by the TRNG is mainly based on two essential features such as uniformity and statistical independence among the actual symbol and the numbers generated in previous rounds. Moreover, the overall circuit design being used to generate 80-bit key is referred as PRNG.  The steps processed in the key generation using TRNG-PRNG module are given as follows: a. Initially, the TRNG-PRNG module generate the 80-bit true random number that is represented as RN0. Next, this 80-bit RN0 value is truncated into four 20 bits such as T1, T2, T3, and T4 which is shown in Eq. (7).
where, M1 and M2 represents the MUX operation value between the pairs of T1 − T2 and T3 − T4 respectively. c. The values from the MUX processes M1 and M2 are XORed with the 20bit values of the T1 and T4 respectively. The XOR operation between the pairs of M1 − T1 and M2 − T4 are denoted as X 1 and X 2 respectively as shown in Eq. (9). For example, the XOR operation between the M1 − T1 pair is shown in Tab. 4.
where, A1, A2, A3, and A4 are the values obtained through the addition process.
e. The values of A1 − A2 and A3 − A4 are processed under XNOR operation, once the addition is completed. Eq. (11) shows the process of XNOR operation and sample XNOR operation between the pair of A1 − A2 is shown in Tab. 5.
where, X 3 andX 4 are the XNOR values between the A1 − A2 and A3 − A4 pair respectively. f. One more MUX operation is carried out using 4 different inputs such as A1, X 3, A4, and X 4. The output (M3) from the MUX operation is mainly defined based on the counter value. For instance, the MUX gives A1 as output when the counter value is 0. Similarly, the MUX provides the output of X 3, A4 and X 4, when the counter value is 1, 2 and 3 respectively. g. Finally, the concatenation operation between the X 3, M3, X 4 and A4 is carried out to generate the 80 bit key value i.e., kdat1 = X 3||M3||X 4||A4. This kdat1 of 80-bit value from the TRNG-PRNG module is given as input to the encryption process.
The optimization of hardware components using DROM leads to obtain the high operating frequency and less area utilization while designing the PRESENT architecture. Moreover, the generation of key for each round and each plaintext improves the robustness of the encrypted cipher text against attackers. Therefore, it is difficult to predict the original plain text without knowing key value generated from the TRNG-PRNG module.

Results and Discussion
The results and discussion of the TRNG-PRNG based PRESENT architecture is described in this section. The implementation of the PRESENT architecture along with key generation module i.e., TRNG-PRNG module is carried out using the Xilinx ISE 14.2 software. This TRNG-PRNG based PRESENT architecture is designed using the very high speed integrated circuit hardware description language and ModelSim simulator is used to perform the functional simulations. Moreover, the MATLAB R2018a software is used to convert the image file into txt file. In PRESENT architecture, the TRNG-PRNG module is used to generate the key to accomplish the encryption/decryption process. The developed PRESENT architecture supports the 80-bit key value for 64-bit input.   The gray scale input image shown in the Fig. 6 contains totally 16384 pixels. Additionally, the gray scale image is converted into binary format using the dec2bin function. The binary format of the image is shown in Fig. 7 and this binary value is stored in the memory of the FPGA processor. The histogram of the input image obtained using imhist function is shown in the Fig. 8. Next, the binary values of the sample image are divided into 64-bits and it is given as input to the encryption process. On the other hand, the TRNG-PRNG module generates the efficient key value of 80-bit that used to encrypt the input plain text.  Fig. 9 represents the input data (PT) and cipher text (CT). Here, the input data (PT) is given to encryption and cipher text (CT) is obtained from the encryption operation. The key represents the 80-bit key value generated using TRNG-PRNG module.kreg and dreg are the registers used to store the input plaintext and key value from the TRNG-PRNG module respectively. Next, dat1, dat2, dat3 and kdat1, kdat2 are the intermediate variables of plaintext and key value that process 64-bit path encryption. Further, the round represents the number of rounds processed during the encryption process. Fig. 9 highlights that the encryption PRESENT-TRNG-PRNG architecture satisfies the test vector. For example, the output cipher text (i.e., 5579C1387B228445) marked by the red box in the Fig. 9 is equal to the cipher text given in the test vector. From the test vector analysis, it is proved that the PRESENT-TRNG-PRNG architecture works precisely during encryption. This test vector is verified for the PRESENT architecture except the 2nd stage key scheduling security.
The hardware utilization, power, delay, and frequency for the different FPGA architectures are illustrated as follows: The hardware utilization of the PRESENT-TRNG-PRNG architecture for Spartan 6 is shown in the Tab. 6. The results shown from Tab. 6 is taken for the 64-bit path encryption using 80-bit key value. The LUT, slices and flip flops for the Spartan 6 device are 45, 35 and 48 respectively. From hardware analysis, the amount of LUT used by the Spartan 6 is less as compared to the remaining five FPGA devices. If the PRESENT-TRNG-PRNG architecture is implemented in the hardware Spartan 6, the encryption output is easily verified by using the 16-output light emitting diodes present in the Spartan 6 FPGA device. The utilization of 8 DROMs instead of 16 Sboxes in PRESENT architecture helps to minimize the hardware utilization. Moreover, the analysis of frequency, delay and power are shown in the Tab. 7. These performances are evaluated for different FPGA devices. Tab. 7 shows that the PRESENT-TRNG-PRNG architecture using Virtex 5 FPGA device provides higher frequency i.e., 612.208 MHz when compared to the remaining FPGA devices. The frequency of the PRESENT architecture with Virtex 5 device increase due to the less amount of hardware utilization.   The encrypted binary value of input image pixel is transferred to the MATLAB R2018a software. The encrypted image using PRESENT-TRNG-PRNG architecture and its histogram count are shown in the Figs. 10 and 12, respectively. Similarly, the decrypted image and its histogram count are shown in the Figs. 11 and 13 respectively. The amount of error occurred between the input sample image to the decrypted sample are calculated using the histogram count. Moreover, the image retrieval performance of the PRESENT-TRNG-PRNG architecture are analyzed using the MSE, PSNR and SSIM. The PRESENT-TRNG-PRNG architecture obtains significant PSNR and SSIM of 49.8762 dB and 0.8211 respectively. Hence, the PRESENT-TRNG-PRNG architecture preserves the details in the image during the encryption/decryption process.

Comparative Analysis
The effectiveness of the PRESENT-TRNG-PRNG architecture is evaluated by comparing with three existing PRESENT architecture designs. The existing methods used for the performance evaluation are PRESENT-OTF [18], PRESENT-STS [19] and PRESENT-RK [22]. The comparative analysis is accomplished by using five different FPGA devices such as Spartan 3, Spartan 6, Virtex 4, Virtex 5 and Kintex 7.
Tabs. 8 and 9 shows the comparison of the PRESENT-TRNG-PRNG architecture with the PRESENT-OTF [18], and PRESENT-RK [22] respectively. The comparison shows that the PRESENT-TRNG-PRNG architecture utilizes less amount of hardware components when compared to the PRESENT-OTF [18], and PRESENT-RK [22]. The PRESENT-STS [19] is used for high amount of S-box operation (e.g., 16 S-boxes) during encryption/decryption as well as this PRESENT-STS [19] requires additional comparator to generate the output that leads to increase the hardware utilization. But, the PRESENT-TRNG-PRNG uses only 8 DROM to accomplish the operation of the S-box. The DROM is used in both the encryption and key scheduling process that minimizes the overall hardware utilization. Moreover, the manual key generation of the PRESENT-RK [22] is vulnerable to the malicious attackers because the manually generated keys in PRESENT-RK [22] can be easily detected by the attackers. The two-stage security in the 64-bit path encryption and random key generation using TRNG-PRNG module increases the security against the malicious attackers.

Conclusion
In this paper, the TRNG-PRNG module based key generation is accomplished in PRESENT architecture to generate the 80-bit key value to support the 64-bit of input value. Additionally, the randomness of the key obtained from the TRNG-PRNG module is increased using the two stage security during the 64-bit path encryption. Therefore, the key value used in the PRESENT-TRNG-PRNG architecture is unpredictable by the malicious attackers which improves the security of the input value. Moreover, the hardware utilization of the PRESENT architecture is minimized using the DROM to process the operation of S-box. Hence, the PRESENT-TRNG-PRNG architecture minimizes the logical elements while maintaining the higher security. The PRESENT-TRNG-PRNG architecture provides better performance when compared to the PRESENT-OTF, PRESENT-STS and PRESENT-RK. The operating frequency of the PRESENT-TRNG-PRNG is 612.208 MHz for Virtex 5, it is high when compared to the PRESENT-RK. In future, the architecture level optimization can be implemented as well as the hardware utilization and power consumption will be reduced for the entire PRESENT architecture.