Lightweight Authenticated Encryption Mode of Operation for Tweakable Block Ciphers

. Using a small block length is a common strategy in designing lightweight block cipher. So far, many 64-bit primitives have been proposed. However, if we use such a 64-bit primitive for an authenticated encryption with birthday-bound security, it has only 32-bit plaintext complexity which is subject to a practical attack. To take advantage of a short block length without losing security, we propose a lightweight AEAD mode FBAE that achieves beyond-birthday-bound security. For the purpose, we extend the idea of iCOFB , originally de(cid:12)ned with a tweakable random function, with tweakable block cipher. More speci(cid:12)cally, we (cid:12)x the tweak length which was variable in iCOFB , and further generalize the feedback function. Moreover, we improve its security bound. We evaluate the concrete hardware performances of FBAE . FBAE bene(cid:12)ts from the small block length and shows the particularly good performances in threshold implementation.


Introduction
Driven by a demand for secure connectivity in resource-constrained embedded devices, lightweight cryptography has been actively studied in the last decade.Consequently, a number of lightweight block ciphers have been proposed [2,5,6,9,12,34] including PRESENT [8,36] and CLEFIA [35] standardized in ISO/IEC 29192-2.
A common strategy for designing a lightweight block cipher is to use a small block length.For example, PRESENT [8] and PRINCE [9] support 64-bit block length only.Many more algorithms such as GIFT [4] and SKINNY [6] provide 64-bit options.The small block length contributes to a smaller memory footprint and a shorter round number that is crucial for a lightweight implementation.
Resource-constrained devices are frequently used in a hostile environment in which sidechannel attack (SCA) [18] should be considered.Designers face an even more challenging task of realizing an SCA-resistant implementation with a limited resource.Researchers have tackled this problem and proposed many lightweight and SCA resistant implementations [28,32,24,6,13] including the ones protected by threshold implementation (TI) [29].The advantage of a block cipher with a small block length (i.e., a small state size) becomes even larger with TI in which a shared representation of the state multiplies the memory requirement.
In order to leverage the benefit of lightweight block cipher for realizing both confidentiality and integrity, lightweight modes of operation for authenticated encryption with associated data (AEAD) have been actively studied in the last few years promoted by the CAESAR competition and the NIST's move toward standardizing lightweight cryptography [30].So far, lightweight AEAD modes such as COFB [10] and SAEB [25] have been proposed.However, the short block length of lightweight cryptography can be a problem for security.The lightweight AEAD modes have security up to the so-called birthday bound.More specifically, the security is ensured up to O(2 b/2 ) block-cipher calls when instantiated with a b-bit block cipher.With a 64-bit block cipher, the security is ensured up to 2 32 block-cipher calls only.It is subject to a practical attack as demonstrated by the Sweet32 attack [7].The use of an AEAD mode with beyond-birthday-bound (BBB) security is a solution for avoiding the birthday problem.There are block-cipher-based AEAD modes with BBB security including CHM [14], CIP [15], and AEAD modes with CLRW2 [21] or r-CLRW [20].However, they are costly compared with the lightweight AEAD modes, since two or more independent universal hash functions are required.Another solution is to construct a (dedicated) TBCbased AEAD mode.The TBC-based AEAD modes, including ΘCB3 [19], OTR [22], SCT [31] and ZAE [16], realize better efficiency and security.Especially, ΘCB3 has the smallest state in the category of the BBB-secure AEAD modes.

Motivation, Approach, and Problems
Our motivation is to design a lightweight BBB-secure AEAD mode thereby taking advantage of a short block length without losing security.For being lightweight, we use the four criteria for lightweight AEAD [25] which is used in designing the block-cipher-based lightweight AEAD mode SAEB as shown in Table 1: -No extra state: The AEAD mode uses no additional memory in addition to the ones used within the (tweakable) block cipher.-Inverse free: The AEAD mode uses no decryption call of the (tweakable) block cipher.
-XOR only: The AEAD mode needs XOR only in addition to the (tweakable) block cipher.
-Online: The AEAD mode scans the incoming message only once.
Using a (dedicated) TBC is a promising approach for designing a lightweight and BBBsecure AEAD mode; however, none of the previous TBC-based AEAD modes, including ΘCB3, satisfy all the lightweight criteria (see Table 1).
Our approach is to design a (dedicated) TBC-based AEAD mode by extending the idea of iCOFB [10].iCOFB shown in Figure 1 is a generalization of COFB by tweakable random function (TRF) having b-bit outputs, denoted by R. In iCOFB, a TRF is called for each message/ciphertext block.Feedback functions ρ/ρ ′ are used to map a pair of a TRF output Y i and a plaintext/ciphertext block M i /C i to the next TRF input X i+1 and a ciphertext/plaintext block C i /M i .More specifically, the following linear functions are considered, which is expressed by a 2b × 2b binary matrix.
After consuming all the message blocks, a TRF is called once again to generate a tag T .It was proven that iCOFB has O(2 b ) security with (ρ, ρ ′ ) satisfying a certain criterion (see Subsection 3.1).As shown in Figure 1, iCOFB needs no extra state in addition to the ones within the underlying TRF.Besides, iCOFB takes message/ciphertext blocks online and does not need an inverse of TRF.Moreover, the linear functions ρ and ρ ′ can be realized with XOR only.Therefore, iCOFB satisfies all the requirements regarding TRF-based AEAD.
There are two problems in designing a TBC-based AEAD mode from the iCOFB's idea.First, since associated data (AD) is a part of a tweak, the underlying TRF should accept an arbitrarylength tweak.On the contrary, lightweight TBCs such as SKINNY accepts a fixed-length tweak only.Using the XT tweak extension [23] is a possible solution, but it requires a universal hash function accepting an arbitrary-length input that can be costly in implementation.Second, the security bound of iCOFB is O(ℓ max q/2 b ) which depends on the maximum message block length ℓ max and the number of queries q (the sum of the numbers of encryption queries and forgery attempts).It is degraded compared with that of ΘCB3, O(q D /2 b ), wherein q D is the number of forgery attempts.Large ℓ max and/or q cause a short key life: an additional cost for rekeying or a shorter product lifetime.

Contribution
We design a (fixed tweak-length) TBC-based AEAD mode called FBAE that solves the above two problems and satisfies all the lightweight criteria as shown in Table 1.Moreover, we general-ize the feedback functions that cover a broader class of feedback functions including non-linear ones.
We address the first problem by designing a new AD processing part.We introduce a (possibly non-linear) feedback function δ (a) that maps an AD block A i and an TBC output block W i to the next TBC input V i+1 .A given AD is processed block by block by using a fixed-tweak TBC and the feedback function δ (a) .
To address the second problem, we generalize the feedback functions ρ and ρ ′ to the pairs of functions (γ (e) , δ (e) ) and (γ (d) , δ (d) ) given by We show conditions on the generalized feedback functions (given in Subsections 3.2 and 3.3) under which FBAE satisfy the security bound of O(q D /2 b ) -the same level of security as ΘCB3.The set of generalized feedback function satisfying the condition is a superset of (ρ, ρ ′ ) in iCOFB, and thus involves a broader class of functions.
The benefit of the proposed TBC-based AEAD mode is evaluated through concrete hardware implementations.In the implementation, we use a particularly efficient set of feedback functions: We refer the specialization as the plaintext feedback mode (PFB) because the TBC input is always M i .We remark that the encryption of PFB is parallelizable, unlike the existing lightweight AEAD modes COFB and SAEB1 .The feature is desirable for communication between entities with asymmetric resources, e.g., a central server sends encrypted commands to many resourceconstrained nodes.
In the implementations, PFB is instantiated with the lightweight TBC SKINNY-64-192.Its performance is compared with the state-of-the-art block-cipher-based alternative with the same level of security: SAEB instantiated with GIFT-128-128.For each of the AEADs, we evaluate the performances with and without TI.We show that PFB benefit from the small block length and shows the particularly good performance in implementations with the SCA countermeasure: it has the smallest circuit area compared with the SAEB implementation and the conventional implementations of Ascon [11]) and Ketje [1].

Organization
This paper is organized as follows.In Section 2, we briefly review TBC and AEAD.Then, we describe the design principle and definition of FBAE in Section 3, followed by its security result is Section 4. We show hardware implementations and their performance comparison in Section 5.

Preliminaries
Notation.Let λ be an empty string and {0, 1} * the set of all bit strings.For an integer i ≥ 0, let {0, 1} i be the set of all i-bit strings, {0, 1} 0 := {λ}, ( {0, 1} i ) * the set of all bit strings whose lengths are multiples of i, and {0, 1} ≤i := {0, 1} 1 ∪ {0, 1} 2 ∪ • • • ∪ {0, 1} i the set of all bit strings of length at most i.Let 0 i resp. 1 i be the bit string of i-bit zeros resp.ones.For an integer i ≥ 1, let [i] := {1, 2, . . ., i} be the set of positive integers equal to or less than i, and For a non-empty set T , T $ ← − T means that an element is chosen uniformly at random from T and is assigned to T .The concatenation of two bit strings X and Y is written as X∥Y or XY when no confusion is possible.For integers 0 ≤ i ≤ j and X ∈ {0, 1} j , let msb i (X) resp.lsb i (X) be the most resp.least significant i bits of X.For integers i and j with 0 ≤ i < 2 j , let str j (i) be the j-bit binary representation of i.For an integer b ≥ 0 and a bit string X, we denote the parsing into fixed-length b-bit strings as ( Tweakable Block Cipher.A tweakable block cipher (TBC) is a set of permutations indexed by a key and a public input called tweak.Let K be the key spece, T W the tweak space, and b the input/output-block size.A TBC (encryption) is denoted by A TBC having a key K ∈ K is denoted by E K , and E K having a tweak T W ∈ T W is denoted by E T W K .In this paper, a keyed TBC is assumed to be a secure tweakable-pseudo-random permutation, or TPRP for short, which is indistinguishable from a tweakable random permutation (TRP).A tweakable permutation (TP) P : T W × {0, 1} b → {0, 1} b is a set of b-bit permutations indexed by a tweak in T W. A TP having a tweak T W ∈ T W is denoted by P T W .Let Perm(T W, {0, 1} b ) be the set of all TPs of block size b and tweak space T W. A TRP is defined as P $ ← − Perm(T W, {0, 1} b ).In the TPRP-security game, an adversary A has access to either the target keyed TBC E K for K $ ← − K or a TRP P $ ← − Perm(T W, {0, 1} b ).After the interaction, A returns a decision bit y ∈ {0, 1}.The output of A with access to O is denoted by A O .For a TBC E, the TPRP-security advantage function of an adversary A is defined as where the probabilities are taken over K, P and A.
The maximum over all adversaries, running in time at most t and making at most σ queries, is denoted by Nonce-Based Authenticated Encryption with Associated Data.A nonce-based authenticated encryption with associated data (nAEAD) scheme based on a keyed TBC E K is denoted by Π[ E K ] and is a pair of encryption and decryption algorithms (Π.Enc[ E K ], Π.Dec[ E K ]).K, N , M, C, A and T are the sets of keys, nonces, plaintexts, ciphertexts, associated data (AD) and tags of the nAEAD scheme, respectively.In this paper, the key space of an nAEAD scheme is equal to that of the underlying TBC.We follow the security definition of an nAEAD scheme in [26,33] that is the indistinguishability between ) and ($, ⊥), where $ is a random-bits oracle that has the same interface as Π.Enc[ E K ] and for a query (N, A, M ) returns a random bit string of length |Π.Enc[ E K ](N, A, M )|; ⊥ is an oracle that returns the reject symbol ⊥ for any query.In the nAEAD-security game, first an adversary A interacts with either Π[ E K ] or ($, ⊥), and then returns a decision bit y ∈ {0, 1}.For an nAEAD scheme Π[ E K ], the nAEAD-security advantage function of an adversary A is defined as where the probabilities are taken over K, $ and A. We demand that A is nonce-respecting (all nonces in encryption queries are distinct), that A never asks a trivial decryption query (N, A, C, T ), i.e., there is a prior encryption query (N, and that A never repeats a query.Through this paper, the world with Π[ E K ] is called "real world," and the world with ($, ⊥) is called " ideal world." Queries to Π.Enc[ E K ]/$ are called "encryption queries," and queries to Π.Dec[ E K ]/ ⊥ are called "decryption queries." The maximum over all adversaries, running in time at most t and making at most q E encryption queries and q D decryption queries of σ the total number of TBC calls invoked by all queries, is denoted by When an adversary is a computationally unbounded algorithm, the time t is disregarded.

FBAE: TBC-based Feedback Mode
We design a TBC-based nAEAD scheme, basing on the iCOFB design approach.

Brief Overview of iCOFB Design and Security
iCOFB given in [10] is a tweakable random-function (TRF)-based nAEAD scheme and is designed so that an extra state beyond the TRF size is not required.Let ℓ max be the maximum length of ciphertext blocks, and R : In order for iCOFB to become lightweight, the feedback function ρ should be lightweight.[10] considers a linear function, thus ρ is expressed by a 2b × 2b binary matrix: ,ĂƐŚ where E i,j 's are b × b binary matrices.For the decryption of iCOFB, the feedback function ρ ′ is also expressed by a 2b × 2b binary matrix: where D i,j 's are b × b binary matrices.For the correctness of iCOFB, [10] chooses the feedback function ρ ′ with the following conditions: Regarding the security of iCOFB, they show the following theorem.

Theorem 1. If the feedback function ρ satisfies the conditions (A1
invertible, then for any adversary A making at most q D decryption queries of plaintext length at most ℓ max blocks,

FBAE: Design Principle and Specification
We design FBAE, a TBC-based lightweight AEAD mode, by extending the idea of iCOFB to the TBC setting.
Encryption/Decryption Procedures In FBAE, a plaintext/ciphertext is partitioned into b-bit blocks, and as iCOFB, each block is processed by a TBC and feedback function ρ/ρ ′ .But more general functions than the linear feedback functions are considered.
-The feedback function in the encryption is composed of the following two functions: for an integer 0 . The core procedure of the encryption of FBAE that uses these functions is given in Figure 2 (Center).Note that plaintext blocks except for the last block are of l = b, and the last block is of l ≤ b.
-The feedback function in the decryption is composed of the following two functions: for an integer 0 . The core procedure of the decryption of FBAE that uses these functions is given in Figure 2 (Right).Note that ciphertext blocks except for the last block are of l = b, and the last block is of l ≤ b.
Hash Procedure (AD Processing) In order to design a lightweight AEAD scheme, FBAE uses a fixed-tweak-length TBC, whereas iCOFB uses a variable-tweak-length TRF to take variable-length AD.Hence, we define additional procedure of processing variable-length AD.Similar to the encryption/decryption procedures, AD is partitioned into b-bit blocks and then AD blocks are processed by iterating a combination of a TBC and the following feedback function. - Note that an empty AD block is appeared when AD is an empty string.The core procedure of processing an AD block is given in Figure 2 (Left).
Tweak Function Let ℓ max be the maximum block size of AD, plaintext and ciphertext.Regarding a tweak of the underlying TBC, we use the following tweak function: with the following condition: -B1: for any (i, N, j), The first element is used for distinguishing AD, plaintext/ciphertext and whether the last block is a full-bit one or not, which offers a distinct permutation between the hash procedure and the encryption/decryption, and which avoids additional TBC call when the last block is a full-bit one.The second element is a nonce, which offers a distinct permutation for each nonce thereby removing the birthday term regarding the number of queries.The third element is the current block number, which offers a distinct permutation for each block thereby removing the birthday term regarding the query length.

Specification of FBAE
The specification of FBAE is given in Algorithm 1 and is shown in Figure 3. FBAE.Hash is the hash procedure, FBAE.Enc is the encryption, and FBAE.Dec is the decryption.
For the correctness of FBAE, the following conditions are required.Let l be an integer such that 0 < l ≤ b. l , δ (a) , δ (e) , δ (d) .

-B8: for any
The condition B4 ensures that for a plaintext block M i and a TBC output between the encryption and decryption) depends on the randomness of the TBC output Y ′ i .Thus, if the output is distributed over a set X , then the collision probability can be at most 1/|X |.Similar to the condition B5, the condition B7 ensures that in the procedure of processing AD blocks, the internal state collision δ (a) depends on the randomness of the TBC output W ′ i .The conditions B5, B7 are used to upper bound the probability of forging a tag.The condition B6 ensures that in the encryption and decryption procedures, no trivial collision occurs on the internal state values.Note that the condition B6 tolerates an internal state collision from the δ (e) ,γ (e) (a) , δ (e) , δ (d) , γ (e) , γ (d) .
but the first element of f gets rid of the influence of the trivial collision.The condition B6 is defined similarly.
It is easy to see that the classes of the functions γ l , δ (e) , δ (d) with the conditions B4, B5, B6 include the linear feedback function ρ, ρ ′ with the conditions A1, A2, A3.

Lightweight Instantiations of γ (e) l , γ (d)
l , δ (a) , δ (e) , δ (d) , f In the section 5, we show that FBAE offers a lightweight AEAD scheme, combining with a lightweight TBC.In the implementation, the following lightweight functions are used. ) , where These functions are shown in Figure 4. FBAE with the above functions is called PFB (Plaintext FeedBack).PFB is shown in Figure 7 in Appendix.It is easy to see that the above functions satisfy the conditions B2-B8.

nAEAD-Security of FBAE
The nAEAD-security bound of FBAE is given in the following theorem.
Theorem 2. For FBAE with the conditions B1-B8, we have The proof is given in the following subsections.

Replacing the Keyed TBC E K with a TRP P
The keyed TBC . By the replacement, we have Hereafter, Adv naead FBAE[ P] (q E , q D , σ), the nAEAD-advantage of FBAE[ P ] is upper bounded, where an adversary is a computationally unbounded algorithm and the complexity is solely measured by the numbers of queries.Without loss of generality, an adversary is deterministic.

Upper Bounding Adv naead FBAE[ P ]
(q E , q D , σ) Firstly, a forgery event in the real world is defined.
forge ⇔ ∃i ∈ [q D ] s.t. at the i-th decryption query, ⊥ is returned.
Then, for any adversary A, In Subsection 4.3, Pr is analyzed.In Subsection 4.4, Pr[forge] is analyzed.Putting the upper bounds (4), ( 5) into (2) gives and putting the above upper bound into (1) gives that in Theorem 2.

Analysis of Pr
In the real world, the condition B1 of the tweak function f ensures that all tweaks of P defined by encryption queries are distinct.Hence, the output blocks of P are chosen independently and uniformly at random from {0, 1} b .By the condition B4, all ciphertext blocks C i defined by encryption queries are independently and uniformly distributed over {0, 1} |Ci| , and thus are indistinguishable from those defined by $.By ¬forge, all outputs of FBAE.Dec[ P] are ⊥.Hence, we have Pr

Analysis of Pr[forge]
In the following analysis, without loss of generality, an adversary A aborts after forge occurs.Let forge i be an event that at the i-th decryption query forge occurs (thus forge i occurs as long as forge 1 ∨ forge 2 ∨ • • • ∨ forge i−1 does not occur).Thus, A value/variable V defined at the ith decryption query, except for the lengths a and ℓ, is denoted by V (d) .The lengths a and ℓ are denoted by a d and ℓ d , respectively.Similarly, for an encryption query (N (e) , A (e) , M (e) ), a value/variable V corresponding with the encryption query, except for the lengths a and ℓ, is denoted by V (e) .The lengths a and ℓ are denoted by a e and ℓ e , respectively.In this analysis, we consider the following types of decryption query.

Analysis of Pr [forge i |Type-1]
Under the Type-1 decryption query and by the condition B1, the tweak f (y (d) , N (d) , ℓ d ), with which the TRP defines the tag T (d) , is distinct from all tweaks defined by the previous encryption queries, and is distinct from other tweaks defined by the decryption query.Hence, T (d) is uniformly distributed over {0, 1} τ and independent of the TRP outputs defined by the previous encryption queries and of other TRP outputs defied by the decryption query.Thus, we have
The upper bounds ( 8), (11) give Upper Bounding Pr For the Type-2 decryption query, by S (d) ̸ = S (e) and f (y (e) , N (e) , ℓ e ) = f (y (d) , N (d) , ℓ d ) (the tweaks are the same), the output of the last TRP call by the decryption query is chosen uniformly at random from {0, 1} b \{ P f (y (e) ,N (e) ,ℓe) (S (e) )}.We thus have Pr The condition of the Type-2 decryption query, y (e) = y (d) , is satisfied if and only if Under the Type-2 decryption query, ℓ e = ℓ d is satisfied.Let be sets of distinct blocks obtained from (A (d) , A (e) ) and (M (d) , M (e) ), respectively, where for a d < i, A Regarding p 1 , by the condition , and thus by |I(C (d) , C (e) )| = 0, Hence, we have Using these upper bounds, we have Pr ).Then, by the condition B8, ) .On the other hand, as 1 , by the condition B8 with the conditions in (9) (thus the last condition in B8, 1 ) = H (d) .Hence, we consider the case where i ≥ 2.Then, and the condition B8 with the conditions in (9), in order to satisfy the above equation, W ) is bijective from the condition B7, we have is upper bounded.Thus, the following equation is considered.
By a e ̸ = a d , the tweaks corresponding with the TRP outputs W (d) ae−1 are distinct.Thus, W (d) ae−1 are independently chosen, and at least one of them is chosen uniformly at random from {0, 1} b .(Note that for x ∈ {a, e} if a x = 1 then ) which is a constant.)By the condition B7, at least one of δ (a)   ( W (d) - ).Note that under the Type-2 decryption query, ℓ e = ℓ d is satisfied.Then by the condition B6, ) , where . By the condition B6 with (10) and ) }, and we thus have The above upper bounds give

Implementation
The performance of PFB is evaluated through concrete hardware implementations.For the lightweight TBC, we use a variant of SKINNY having the 64-bit block length and 192-bit tweakey, i.e., SKINNY-64-192 [6].Its performance is compared with the state-of-the-art alternative having the same level of security: SAEB [25] instantiated with the lightweight block cipher GIFT-128-128 [4].In the following, SKINNY-64-192 and GIFT-128-128 are simply referred to as SKINNY and GIFT.In addition, a mode of operation M instantiated with a primitive P is described as Design Policy For a fair comparison, PFB[SKINNY] and SAEB[GIFT] are implemented under the same design policy.They are designed as co-processors aiming at accelerating the main timeconsuming part of AD processing, encryption, and decryption.Meanwhile, the co-processors expect an external controller for handling special cases such as padding and the final-block processing.In order to avoid a hidden cost, the designs hold a key, nonce, and tweak during their lifetimes.In other words, there is no need for storing them in external registers and feeding them multiple times.This policy affect the implementation of on-the-fly key scheduling as we will see in the next section.The circuit area has the highest priority in optimization.The designs are described by a hardware description language (HDL) in register-transfer level (RTL).We do not make netlist-level optimization except scan flip-flops commonly used for compact implementations [24]; the standard cells for scan flip-flops are explicitly instantiated in HDL.
For SCA-protected implementations, we consider TI secure up to the first-order attacks.

PFB[SKINNY]
SKINNY uses three distinct 64-bit states namely TK1, TK2, and TK3 for tweakey schedule.In this particular design, TK3 stores a 64-bit tweak.The remaining TK1 and TK2 store a 128-bit secret key.Fig. 5 shows the hardware architecture of PFB[SKINNY].As shown in Fig. 5, PFB[SKINNY] is realized as a thin wrapper of the SKINNY implementation; the additional components are 4-bit XOR, selector, and AND gate only.
The SKINNY implementation follows the conventional nibble-serial architecture [6], but the tweakey-schedule implementation is designed from scratch.The implementations called the TK1, TK2, and TK3 arrays are based on a common architecture comprising an array of scan flip-flops and integrated on-the-fly key scheduling [24] as shown in Fig. 5.However, the changes made by the on-the-fly key scheduling should be reverted to begin the next TBC call without feeding the same key again.Since SKINNY schedules TK1, TK2, and TK3 by a nibble permutation and a nibble-wise linear transformation for each round, we can obtain efficient inverse maps that revert the final tweakey state to the initial one.Such inverse maps are integrated to the TK1, TK2, and TK3 arrays along with the forward on-the-fly scheduling.
Based on (3.4), the 64-bit tweak is given by id∥N ∥ctr: a 3-bit number distinguishing the operations id = str 3 (i), 45-bit nonce N , and a current block number realized by a 16-bit counter ctr = str 16 (j).id and ctr are updated for each TBC call.For an efficient computation, the TK3 array integrates the circuit for (i) changing id and (ii) incrementing and clearing the counter ctr.Using the above functionality, a user needs to feed id∥N ∥ctr only once for a given nonce N .
Single SKINNY round uses 16 cycles, and thus SKINNY comprising 40 rounds finishes in 16 × 40 = 640 cycles.We need an additional 1 cycle for updating a tweak stored in the TK3 array for the next TBC call.As a result, a 64-bit message or ciphertext block is consumed in 641 cycles.

SAEB[GIFT]
Fig. 6 shows the hardware architecture of SAEB [GIFT].The overall architecture is based on the conventional design [25], but the shift registers for synchronization are removed considering the design policy.It is also realized as a thin wrapper of the underlying GIFT implementation.
The GIFT implementation is based on the nibble-serial architecture [4], but the key array is redesigned to efficiently reverting the changes made by on-the-fly key scheduling.Similar to SKINNY, GIFT has a linear key scheduling algorithm, and thus we can obtain an efficient inverse map that revert the final key state to the initial one.The key array is designed with a 32-bit datapath to efficiently integrate the inverse key-schedule map (the function block labeled with "revert") as shown in Fig. 6.
The S-box is split into two stages namely g and f for TI following the conventional work [13].Consequently, a single GIFT round uses 33 cycles for 32 S-box look-ups and one pipeline latency.As a result, The 40-round operation of GIFT requires 33 × 40 = 1, 320 cycles.

Threshold Implementation
There is an option between protected and unprotected key/tweakey schedule.Conventional attacks such as differential power analysis (DPA) [18] cannot be used to attack key schedule that is independent of an attacker-controllable input e.g., plaintext or ciphertext.That is not generally true for TBCs, but SKINNY has the same property as far as the attacker-controllable tweak is placed in TK3, which is scheduled independently of TK1 and TK2.Consequently, some previous works prioritize circuit area and use unprotected key-schedule implementations [6,32,37].Meanwhile, if we consider a profiling attack on key/tweakey schedule, it is also reasonable to choose a protected key-schedule implementation.Considering the cost-security trade-off, we implement PFB[SKINNY] and SAEB[GIFT] with three different profiles: (P1) the unprotected implementation, (P2) TI with the unprotected key schedule and (P3) TI with the protected key schedule.
Table 2 summarizes the number of registers needed for the SKINNY and GIFT implementations for the different profiles.In (P1), both SKINNY and GIFT use 256 bits in total.In (P2), on the other hand, SKINNY use the smaller number of registers, 384 bits compared with 512 bits, because of the smaller block length.SKINNY still has a better performance in (P3) because key/tweakey schedule can be shared more efficiently.Since both GIFT and SKINNY have linear key/tweakey schedules, they can be realized with only two shares.Moreover, there is no need for protecting TK3 of SKINNY that stores a public tweak.As a result, SKINNY and GIFT use 512 and 684 bits in (P3), respectively.
We use the formulae for the 3-share uniform S-boxes for SKINNY and GIFT from the conventional works [6] and [13], respectively.TI is implemented by duplicating the state/key/tweakey arrays and replacing the decomposed S-boxes (f and g) with their shared maps.Fig. 5 and 6 show the boundaries of sharing for each profile.

Performance Evaluation and Comparison
The designs are synthesized with the NanGate 45-nm standard cell library [27] using Synopsys Design Compiler while preserving the module hierarchy.Table 3 shows the breakdown of the post-synthesis performances.We first discuss the unprotected implementations (P1).The circuit area of PFB[SKINNY] and SAEB[GIFT] are 3, 111 and 2, 761 [GE], respectively.SKINNY and GIFT dominate the circuit area of PFB[SKINNY] and SAEB[GIFT], The additional costs for the mode of operations are limited.The sizes of the state and key arrays are almost proportional to their register sizes, e.g., the 64-bit SKINNY state array (532 [GE]) is almost a half the size of the 128-bit GIFT state array (975 [GE]).
Although the PFB[SKINNY] implementation is larger than that of SAEB[GIFT] by 350 [GE], this is a positive result because (i) GIFT is known to have a better performance compared with SKINNY [6] and (ii) lightweight TBC is an emerging technology compared with lightweight block cipher.It is also note that PFB[SKINNY] is twice as fast as that of SAEB[GIFT]: PFB[SKINNY] and SAEB[GIFT] consume a 64-bit message/ciphertext block using 640 and 1, 320 cycles, respectively.Moreover, PFB has parallelizable encryption as discussed in Sect.3.
Table 4 shows performance comparison with previous implementations.The unprotected implementations of SAEB[GIFT] and PFB[SKINNY] are smaller than previous implementations of AES-based AEs (SAEB[AES128] [25], CLOC[AES128], SILC[AES128], OTR[AES128] [3]).The bitserial Ascon implementation without an interface has a smaller circuit area of 2, 570 [GE] [11]; however, the implementation needs an additional 128-bit key register to run another encryption/decryption with the same key.If we add the size of the key register (640 [GE] for 5 [GE/bit]) to 2, 570 [GE], the Ascon implementation has the similar circuit size compared with that of PFB [SKINNY].We also note that the Ascon implementation with an interface including a 128-bit key register has 3, 750 [GE].We then discuss the protected implementations.With (P2), the PFB[SKINNY] implementation uses 4, 492 [GE] which is smaller than that of SAEB[GIFT] (5, 037 [GE]).That is explained by the smaller number of registers summarized in Table 2. PFB[SKINNY] is still advantageous with (P3): the circuit areas of PFB[SKINNY] and SAEB[GIFT] are 5, 858 and 6, 229 [GE], respectively.The protected PFB implementations are smaller than that of Ascon [11]) and Ketje [1] in conventional works as shown in Table 4.That is also explained by the number of registers.The sponge-based AEs have a relatively large state (384 bits for Ascon and 200 bits for Ketje-JR) that should be protected with three shares.
In summary, the unprotected PFB[SKINNY] implementation is competitive against the unprotected SAEB[GIFT] implementations and other conventional implementations.The benefit of a small block length, enable by PFB, becomes even larger with TI in which the number of registers are multiplied as shown in Table 2.As a result, the protected PFB[SKINNY] implementation outperforms that of SAEB[GIFT], Ascon [11], and Ketje [1].
The encryption algorithm takes a nonce N ∈ N , AD A ∈ A, and a plaintext M ∈ M, and returns, deterministically, a pair of a ciphertext C ∈ C and a tag T ∈ T .The decryption algorithm takes a tuple (N, A, C, T ) ∈ N × A × C × T , and returns, deterministically, either the distinguished invalid (reject) symbol ⊥̸ ∈ M or a plaintext M ∈ M. We require |Π.Enc[ E K ](N, A, M )| = |Π.Enc[ E K ](N, A, M ′ )| when these outputs are strings and |M | = |M ′ |.
where tweak elements are a nonce, AD, a counter and a domain separation.Let ρ : {0, 1} b × {0, 1} b → {0, 1} b × {0, 1} b be a feedback function that takes a b-bit TRF output and a b-bit plaintext block, and outputs a b-bit TRF input and a b-bit ciphertext block.Figure 1 shows the encryption and decryption procedures of iCOFB with three plaintext/ciphertext blocks.

Fig. 2 .
Fig. 2. Core Procedures of AD Processing (Hash), Encryption and Decryption.Ai is an i-th AD block.Mi is an i-th plaintext block.Ci is an i-th ciphertext block.Tweaks are omitted.

Table 1 .
[25]lightweight criteria[25]and AEAD modes.The "No extra state" column shows the number of extra bits if the criterion is not satisfied.

Table 2 .
The number of registers for implementing SKINNY and GIFT in different profiles.

Table 3 .
Breakdown of the post-synthesis circuit area of PFB[SKINNY] and SAEB[GIFT].

Table 4 .
Performance comparison; latency is that of a single call of a primitive (block cipher, tweakable block cipher, or permutation).