Efficient Side-Channel Secure Message Authentication with Better Bounds

We investigate constructing message authentication schemes from symmetric cryptographic primitives, with the goal of achieving security when most intermediate values during tag computation and verification are leaked (i.e., mode-level leakage-resilience). Existing efficient proposals typically follow the plain Hash-thenMAC paradigm T = TGenK(H(M)). When the domain of the MAC function TGenK is {0, 1}128, e.g., when instantiated with the AES, forgery is possible within time 264 and data complexity 1. To dismiss such cheap attacks, we propose two modes: LRW1-based Hash-then-MAC (LRWHM) that is built upon the LRW1 tweakable blockcipher of Liskov, Rivest, and Wagner, and Rekeying Hash-then-MAC (RHM) that employs internal rekeying. Built upon secure AES implementations, LRWHM is provably secure up to (beyond-birthday) 278.3 time complexity, while RHM is provably secure up to 2121 time. Thus in practice, their main security threat is expected to be side-channel key recovery attacks against the AES implementations. Finally, we benchmark the performance of instances of our modes based on the AES and SHA3 and confirm their efficiency.


Introduction
Message authentication (MA) schemes are fundamental symmetric primitives.A MA scheme allows two parties sharing a secret key K to authenticate data they send to each other.The sender applies a tag generation algorithm TGen to K and the message M to get a tag T , and then sends M , T to the receiver.The latter applies a verification algorithm Vrfy to K, a received message M , and its accompanying tag T , to get an output of 1 (accept) or 0 (reject), indicating whether or not the message should be considered authentic.There have been extensive studies on designing secure MA schemes with high performance.Most of them are based on conceptually simple primitives such as (tweakable) blockciphers and hash functions and enjoy provable security guarantees, e.g., CBC-MAC [BKR00], HMAC [BCK96], PMAC [BR02], and Wegman-Carter type MACs [WC81,CS16,DDNY18] to name a few.The simplest option for Vrfy is "Tagthen-compare", i.e., "If TGen(M ) = T then return 1 else return 0".Such MA schemes In detail, we first consider instantiating the TBC with the LRW1 tweakable blockcipher E(tw, x) = E K2 (tw ⊕ E K1 (x)) of Liskov et al. [LRW11],5 and obtain our first mode LRW1based Hash-then-MAC (LRWHM H,E ) depicted in Fig. 1 (left).To analyze its leakage security, we assume that the blockcipher E is "completely" secure against side-channel key recovery, via modeling E as a "normal" strong PRP that leaks nothing (i.e., we model it as a "leak-free" component).This model abstracts the details within the implementation of E and nicely allows us to focus on arguing the harmlessness of mode-level leakages.For the latter, we make a very conservative assumption that all intermediate values appearing during tag generation and verification are leaked in full (jumping ahead, see L TGen and L Vrfy in Fig. 2).This setting, also used in [BKP + 18, BPPS17, BGP + 19], is a special case of the continuous leakage model since the total amount of leakage continuously increases during the lifetime of the system.While appearing theoretical, this assumption enables deriving simple security lower bounds against mode-level leakages.On the other hand, we model the hash function H as a 2n-bit random oracle RO.With the above assumptions, we prove that LRWHM H,E is unforgeable against leakage up to BBB 2 2n/3 /n queries.
Given the literature, it appears quite difficult to beat the 2 2n/3 bound with two blockcipher-calls.In order to achieve higher security, we instantiate the TBC with a rekeying-per-input function that bears some resemblance to a TBC of Minematsu [Min09], and this results in our second mode Rekeying Hash-then-MAC RHM H,E as shown in Fig. 1 (right).This mode works with an n-bit key that is then used to generate a fresh key for the second cipher.It is known that BBB proofs for such rekeying-based modes have to rely on the Ideal Cipher Model (ICM) [BKR98,ST16,Men17].Therefore, we follow this and prove RHM H,E unforgeable against leakage up to asymptotically optimal 2 n /n queries.
In summary, LRWHM is closer to the "standard" solution to BBB secure MACs (i.e., without the slightly inefficient rekeying and the ICM), and ensures a standard BBB 2 2n/3 /n security.RHM is appealing in practice as it achieves optimal bounds (thus more secure against "brute force" attacks) within two blockcipher-calls.Yet, the security insurance of RHM, proved in the ICM, turn a bit heuristic once instantiated [CGH98].
Practical Issues and Comparison.Our modes can be instantiated with AES128 and SHA3, resulting in concrete instances AES-SHA3-LRWHM and AES-SHA3-RHM: both produce 128-bit tags, while the key size of the former is 256-bit which is larger than the latter's 128-bit.It's natural to ask what sort of security is guaranteed on these deployed instances.For this, note that our provable bounds indicate that as long as the AES keys remain "safe", AES-SHA3-LRWHM is secure up to 2 78.3 time complexity, while AES-SHA3-RHM is secure up to 2 121 time.Such computations can be seen as impractical.Therefore, in practice the main threat to these instances is expected to be the side-channel key recovery attack against the AES implementations.The concrete side-channel security depends on the implementations that can't be "predicated" in advance, but the data complexity could be > 2 20 in some cases [GMK17] and would further increase with the level of protections.This is clearly more expensive than the aforementioned low-data attack against the plain HtM scheme, i.e., we've achieved our goal.
In theory, deploying classical MACs in leaking scenarios necessarily requires protecting all the operations.Upon a message with blocks (in the corresponding sense), this typically consumes ≥ executions of some heavy protected circuits: for example, ≥ blockciphercalls in CBC and PMAC [Rog04], ≥ large field multiplications in universal hash-based MACs (like Wegman-Carter [WC81]), ≥ compression function calls in HMAC, and ≥ permutation-calls in (the SHA3-based) KMAC [KCP16].In contrast, in our modes (and all the other HtM-based modes), such a message triggers O( ) calls to a light unprotected hash circuits and 1 or 2 calls to some heavy side-channel secure primitive.The performance gain is obvious.To be more convincing, we implement and estimate the performances.Concretely, we reuse the C code of Dobraunig et al. [DEM + 19] to ease a comparison to the IsapMac variants underlying Isap-K-128 and Isap-K-128A (which we denote IsapMacK and IsapMacKA resp.).This code contains a Keccak-f [400] permutation, upon which we build a SHA3 variant for our modes (this means our hash is very close to IsapMacKA rather than the standard SHA3).For masked AES, we implement the proposals of [GR17].Using these components, we implement AES-SHA3-RHM, AES-SHA3-LRWHM, IsapMacK, IsapMacKA (as they are the only known leakage-resilient MACs with both efficiency and concrete security comparable to ours),6 and a variant of AES-CBC (as a representative of the "fully protected" classical MACs), and compare their performances.The conclusions are: (a) ours outperform IsapMacK when the level of side-channel protections is not too strong (i.e., less than 10th order masking), and (b) ours are comparable to the more aggressive variant IsapMacKA, and (c) ours outperform the protected AES-CBC as long as the messages are not too short (e.g., more than 50 bytes).See Section 4 for details.
For clearness, we serve a comprehensive comparison between our new proposals and the IsapMacs.Indeed we have quite different design goals.Regarding practicality, we emphasized more on easy deployment from any crypto lib (e.g., the HACL lib [ZBPB17]) that has implemented hashing and masked blockciphers (with these, even ones very unfamiliar with implementations could deploy our schemes using several lines of codes), and we enable a modular approach, allowing updating the primitives for better security or switching to more masking-friendly ciphers [GLSV15] for better efficiency.On the other hand, IsapMacs offer more dedicated designs (somewhat supported by the recent leakageresilience proofs [DM19, GPPS19] for duplex) aiming at (potentially) better efficiency.Regarding security, we aim at high side-channel security guarantees (at the cost of some necessary expertise since masking is non-trivial to implement), and our modes "inherit" the security of the (well-understood) primitives in use, whereas IsapMacs aim to trade a bit of these high security guarantees for a scheme enabling the default implementation to provide built-in side-channel resistance.This tradeoff is in particular visible in the case of decryption leakages where repeated measurements may allow obtaining noise-free leakage traces for IsapMacs (raising the risk of advanced algebraic/analytical attacks [RSV09,VGS14]) while masking should mitigate this risk in our schemes.

Potential Applications.
In side-channel sensitive settings, our MA modes could be deployed "alone" for sole integrity, or used for improving authenticated encryption (AE) schemes [BN08].For example, the aforementioned Philips smart lamps used AES-CCM [WHF], which is a sophisticated combination of AES-CTR and AES-CBC MAC.To foil the (forgery-based) attack of [RSWO17], one could in principle protect the AES-CBC implementation.But an alternative approach is to replace AES-CBC by a protected implementation of AES-SHA3-RHM and use a standard Encrypt-then-MAC composition AES-SHA3-CTR-then-RHM, which we abbreviate as CRHM.A bit more concretely, CRHM uses two keys K 1 and K 2 for the 1st pass CTR and 2nd pass RHM, and the 2nd pass RHM produces a tag based on the nonce N , the associated data A and the ciphertext C of CTR.Taking N and A as the inputs of RHM prevents trivial forgeries.The black-box AE security (with no leakage) of this composition follows from the security of CTR, the strong unforgeability of RHM (implied by our strong unforgeability result with leakages), and the standard result on Encrypt-then-MAC composition [BN08,Theorem 4.4].With leakages, this AE still offers a high security against forgery attacks thanks to RHM.As for performance, it achieves much lower latency and energy consumption according to our evaluation, as long as N , A, and M /C accumulate to more than 50 bytes.
We remark that while applications such as IoT were believed to mostly transfer short messages, some technologies do allow larger packets (e.g. up to 243 bytes for LoRa [RKS17]).Moreover, in the CCA setting the scheme has to handle long inputs: this could happen in the DDoS attack scenario, in which the adversary could send many invalid long ciphertexts to trigger verification and cause a huge resource consumption.Therefore, the ability of efficiently handling moderately long messages remains of importance for IoT and similar settings.
On the other hand, if AES-SHA3-HtM is used, leading to CTR-then-HtM, then the fatal term t 2 /2 128 remains in the (standard model) provable bound due to the hash collision (t denotes the time complexity.See, e.g., [BPPS17, Theorem 4]), rendering it less reliable.
It's worth noting that the application of BBB secure HtM variant may be far beyond side-channel security.For example, in [RSS17], CTR-then-HtM was identified as the most efficient AE suitable for evaluation in multi-party computation engines.The performance advantage of the HtM paradigm stems from the fact that keys are held in secret shared form in this setting, and thus paradigms with a minimal number of calls to keyed primitives outperform the others.Clearly, the composition CTR-then-RHM could also be used here to dismiss the fatal term t 2 /2 128 caused by the plain HtM.Regarding this setting, one may emphasize inverse-freeness instead of leakage-resilience.In light of this, we provide one more MAC mode that is simpler, inverse-free, parallelizable, and reducible to falsifiable hashing assumptions: see Appendix A.
Further Related Work.Various MACs with BBB black-box security have been proposed: DWCDM [DDNY18], 3kf9 [ZWSW12], PMAC variants [Yas11, DDN + 17], Double-block Hash-then-Sum [DDNP18], and those from TBCs [IMPS17, CLS17] and compression functions [Yas09].We quote a MAC [DS11] that was also proved BBB secure under the "unbounded leakage" assumption.This design is rather complicated and consumes • poly(n) number of side-channel protected blockcipher-calls upon -block messages.This construction is more attractive from a theoretical point of view, as it's a BBB secure MAC domain extender.We rather provide modes that are simple and easy to deploy.
Finally, we remark that our proofs seem more complicated than the black-box MAC proofs, and bear some resemblance to indifferentiability [MRH04] proofs, e.g., to [ABD + 13].
Organization.We serve notations and definitions in Section 2.Then, in Section 3 we present the two new modes and their security proofs.In Section 4 we benchmark the performances of their concrete instances and make comparison.

Preliminaries
General Notations and Definitions.For a finite set X , X $ ← − X denotes selecting an element from X uniformly at random and |X | denotes its cardinality.In all the following, we fix an integer n ≥ 1.Further denote by H(2n) the set of all functions of domain {0, 1} * and range {0, 1} 2n , by P(n) the set of all permutations on {0, 1} n , and by BC(κ, n) the set of all blockciphers with n-bit block-size and κ-bit keys (though, we will mainly use κ = n in this paper).Finally, for U, X ∈ {0, 1} n , U X or simply U X denotes their concatenation.Adversary.We denote by a (q, t)-adversary a probabilistic algorithm that has access to several oracles (the number of which depends on the concrete context), can make at most q queries to its (multiple) oracles, and can perform computation bounded by running time t.Strong Pseudorandom Permutation.For the security analysis of LRWHM, we model the underlying blockcipher as a strong pseudorandom permutation, abbreviated as SPRP.Formally, for an n-bit blockcipher E : {0, 1} κ × {0, 1} n → {0, 1} n , the SPRP advantage of a (q, t)-adversary A is a Message Authentication (MA) scheme is a tuple of two polynomial time algorithms Scheme = (TGen, Vrfy) defined as follows: • the tag generation algorithm TGen K (M ) takes as input the secret key K and the message M , and then outputs a tag T ; • the deterministic, stateless verification algorithm Vrfy K (M, T ) takes as input the secret key K, a message M , and a tag T .The algorithm outputs 1 (accept) if the tag is valid for the message, else it outputs 0 (reject).
Informally, the MA scheme Scheme is said to be strongly unforgeable [BN08], if the adversary is unsuccessful in the following security game.First, a key K is selected as part of the experiment.Next, the adversary A can arbitrarily choose messages and ask for tags under the key K, or ask for verify the correctness of a message-tag pair (also under the key K).Following [CS16], the adversary is non-trivial, in the sense that it never asks a verification query Vrfy(M, T ) if a previous tagging query TGen(M ) returned T .Under this restriction, we say that A forges if any of its queries to Vrfy returns 1 (accept).
We denote by LTGen and LVrfy the leaking implementation of TGen and Vrfy algorithms resp.LTGen runs both TGen and a leakage function L TGen which captures the additional information given by an implementation of TGen during its execution, and returns the outputs of both TGen and L TGen which all take the same input; similarly for LVrfy and L Vrfy .Later in Section 3, we will explicitly define L TGen and L Vrfy for each MA mode.The rules are: 2. Return T as the tag.According to the rules specified in Section 2, the leakage function is L TGen (K, M ) := (U, V, X, Y, T ).

Figure 2:
The description of LRWHM H,E MA mode.
6.A hash-call H(M ) → U X leaks M, U , and X.
During the MA security interaction, if either the tag generations or the verifications give corresponding leakage along with the answer, then we are in the MA security definitions in the presence of leakage.In contrast, we never consider leakages of the key generation, as the actual way of loading the key into a device can vary quite a lot from one situation to another, and will usually happen at manufacturing time, out of reach of the adversary.In this paper, we always consider the setting where both TGen oracle leakages and the Vrfy oracle leakages are presented.Formally, we define The suffix 2 indicates that the number of involved leaking oracles is two; this follows the convention in [GPPS18].The presence of RO is in accordance with our use of the random oracle model.

MA Modes and Provable Results
In this section we present the formal definitions of our modes and their provable results.Interpretations of the theorems are deferred to the next section.

Mode LRWHM and Its Security
Formally, the mode LRWHM H,E along with the leakages is defined in Fig. 2. With the "leakfree" blockcipher plus "unbounded" mode-level leakage assumptions, the MAL2 security of LRWHM H,E is proved up to 2 2n/3 /n queries.
1} n be a blockcipher and RO : {0, 1} * → {0, 1} 2n be a random oracle.Then with the leakages specified by L TGen and L Vrfy (Fig. 2), for any (q, t)-adversary A against the MAL2 security of LRWHM RO,E , there exists a (q, t + q • t LRWHM )-adversary A against the SPRP security of E, such that t LRWHM stands for the time to evaluate LRWHM RO,E once (with an adversarypicked key), and that Proof.While the mode is simple, its analysis has to be dedicated and quite non-trivial, since Liskov et al.only proved 2 n/2 security for LRW1 TBC [LRW11] and do not support the modular way.To ease understanding, below we first overview the proof ideas and steps in subsubsection 3.1.1,and then present the main steps in subsubsections 3.1.2and 3.1.3.

Proof Overview
Based on the adversarial power, we make some initial observations: (i) During the interaction, RO is queried at most q times, since each such query is either directly made by A, or made by TGen or Vrfy which is ultimately made by A; (ii) similarly, the number of calls to E K1 , resp.E K2 , is at most q.
Recall that our goal is to bound The first step, idealizing the scheme, is standard for MAC security proofs.In detail, we replace the calls to E K1 and E K2 underlying LTGen RO,E K1 K2 and LVrfy RO,E K1 K2 by two independent random permutations P = (P 1 , P 2 ), and denote by LTGen RO,P and LVrfy RO,P the obtained idealized oracles.By a straightforward hybrid argument,7 there is an adversary A that makes at most q oracle queries and evaluates (a certain part of) LRWHM RO,E for at most q times (thus the running time t + q • t LRWHM ), such that where We then derive an upper bound for Adv MAL2 LRWHM RO,P (A).The idea is that, if a non-trivial verification query returns 1, then right after this query returns, there exists a "chain" of historical query-records RO(M ) = U X, P 1 (U ) = V , and P 2 (V ⊕ X) = T such that the tag generating action LTGen RO,P (M ) → T never happened before.It then suffices to prove that such chains are unlikely to occur.As will be clear in the proof, the presence of such a chain is typically due to unexpected collisions between the three query-records within the chain (for example, an RO query RO(M ) = U X collide with P 1 (U ) = V and P 2 (Y ) = T , in the sense that U = U and X = V ⊕ Y ), and thus the probability could go beyond the birthday.In addition, a crucial property is that verification queries are checked using the inverse, so that the involved permutation query-records have random "endpoints" at the input side and low probability to collide with existing values.It can be seen that, if verification is defined in the classical inverse-free manner, then a single verification query could create such a "chain" and leak it to A for forging, and no proof is possible.
To formalize the above ideas, we analyze the adversarial interaction with the idealized oracles LTGen RO,P and LVrfy RO,P using the game-playing technique [BR06].We describe the security game in Fig. 3.The game offers three interfaces to A to mimic the oracles LTGen RO,P , LVrfy RO,P , and RO (captured by the statements following "When A asks...").It also has 4 secret procedures P 1 , P −1 1 , P 2 , and P −1 2 for internal random permutation calls.To mimic the ideal oracles, the game maintains three sets ROSet, PSet 1 , and PSet 2 for already defined RO, P 1 , and P 2 query-records, and uses lazy sampling to gradually create new records.The game also maintains a global query counter qnum to indicate the timestamp of the records.Therefore, a record in the set ROSet is of the form (M, U X, num) with M ∈ {0, 1} * , U, X ∈ {0, 1} n , indicating the relation that RO(M ) = U X and that the record was created when qnum = num.A record in the set PSet 1 is of the form (U, V, dir, num) indicating similar meanings.The additional field dir indicates the direction of the internal P 1 query that produces this record: dir =→ means it was a forward query P 1 (U ) → V , while dir =← means it was backward P −1 1 (V ) → U .The set PSet 2 is just similar to PSet 1 .In addition, four quantities α, β, γ 1 , and γ 2 are used in Fig. 3, which are defined as ∈ TGened} , (4) Finally, the game also maintains a set TGened for the messages involved in earlier LTGen RO,P queries, i.e., M ∈ TGened if and only if LTGen RO,P (M ) has been queried.As in typical game-based proofs, we specify several "bad events" that may lead to chains of records in future, and force the game to abort (as shown in Fig. 3) when any of the events occur.Once abortion occurs, we write "A RO,LTGen RO,P ,LVrfy RO,P aborts".In the remaining, we proceed by first upper bounding Pr[A RO,LTGen RO,P ,LVrfy RO,P aborts] in subsubsection 3.1.2,and then arguing that as long as abortion does not occur, chains of records are unlikely to occur-and thus A is unlikely to forge-in subsubsection 3.1.3.

Probability of Abortion
Here we devote to prove Pr[A RO,LTGen RO,P ,LVrfy RO,P aborts] ≤ 6q 3/2 /2 n .For this, we consider the conditions in Fig. 3 in turn.First, (B-1) essentially captures the event of collision within the RO queries, thus Pr[(B-1)] ≤ q(q−1) 2 2n+1 .We then consider (B-2).Its first half states that there exist 3 distinct queries (M, U X, ), (M , U X , ), (M , U X , ) such that U = U = U , the probability of which is ≤ q 3 • 1 2 2n .Analysis for the second half is similar.By the above, 8 Pr[(B-1) For the condition (B-3), the quantity γ 1 is viewed as a random variable over the random choice of RO.Note that Using Markov inequality we obtain 8 In Theorem 1, it may be tempting to separate different types of queries, i.e., to derive the bounds based on the assumption that A makes q h , qm, and qv queries to the oracles RO, LTGen, and LVrfy resp., with the hope of getting ride of the data complexity-independent term q 3/2 h /2 n .But this is not successful: with the new notations, RO is queried at most Q = q h + qm + qv queries, and thus Pr[(B-1) ∨ (B-2)] "just" changes to Q 3/2 /2 n -the term q 3/2 h /2 n remains.We thereby eschew this approach for simplicity.Whether the term q 3/2 h /2 n can be avoided via improved analyses is an open question..
We then bound |S 2 |.For any such pair ((M, U X, n 1 ), (U , V , dir, n 2 )), we distinguish two cases: and thus Pr[U = U ] = 1 2 n ; • Case 2: n 2 > n 1 and dir =←.Then right before (U , V , ←, n 2 ) is created, U is uniform in at least 2 n − q possibilities, and thus Pr Since the number of such pairs ((M, U X, n 1 ), (U , V , dir, n 2 )) is at most q 2 , we can use Markov inequality to obtain Finally, conditioned on ¬(B-5), we analyze (B-6).To derive a bound for β, we view the execution as a random process of "creating" V values.Each time a new record (M, U X, ) or (Y, T, , ) is added to the sets, a set of V values are "created".Now consider a certain value V .Its frequency µ V could be increased due to the following three actions: We denote by S the set of all these actions, and divide it into two subsets S 3 and S 4 .Intuitively, any record (Y, T, →, n 2 ) ∈ PSet 2 is in S 3 , if and only if: (i) it is created during processing a tag generation query LTGen RO,P (M ), and (ii) for the query LTGen RO,P (M ), for U X = RO(M ), it holds (U, V, , n 1 ) ∈ PSet 1 before LTGen RO,P (M ) is made.
Denote by n M the value of qnum when LTGen RO,P (M ) is made, then the two conditions imply that n 2 = n M or n 2 = n M + 1. 9 Therefore, we have the formal definitions: S 4 := S S 3 .
First, we denote by µ V (S 3 ) the number of increments on µ V due to S 3 and analyze it.It's not hard to see |S 3 | ≤ α.Assume that right before such a record (Y, T, →, n 2 ) is added to PSet 2 , there have been s different values X in the set ROSet, i.e., For convenience, we write them as X 1 , . . ., X s .Then it can be seen after (Y, T, →, ) is added to PSet 2 , s distinct V values X 1 ⊕ Y, . . ., X s ⊕ Y are introduced.By this, µ V (S 3 ) gets at most 1 chance of increasing.Wlog assume V = X i ⊕ Y .Conditioned on ¬(B-2), the number of RO query-records (M, U X, ) such that X = X i is at most 2; conditioned on ¬(B-5), we have We then analyze the increments due to S 4 -denoted µ V (S 4 ).Assume that |S 4 | = w.In this respect, we consider a sequence of variables L i (V ), 1 ≤ i ≤ w, where L i (V ) = 1 if µ V is increased during the i-th record creating action (as we have seen, the increment may be greater than 1).
We next prove for any sequence (a 1 , . . ., a i−1 ) ∈ {0, 1} i−1 , when q ≤ 2 n /2, regardless of the concrete record being created, we have To this end, we assume that conditioned on (L 1 (V ), . . ., L i−1 (V )) = (a 1 , . . ., a i−1 ), there have been s different values X and t different values T in the sets, i.e., {X | ∃(M, U ) : (M, U X, ) ∈ ROSet} = s and {Y | ∃T : (Y, T, , ) ∈ PSet 2 } = t.For convenience, we write these values as X 1 , . . ., X s ; Y 1 , . . ., Y t .Now consider a relevant record-creating action.There are three cases: Case 1: (M, U X, ) is created.In this case, X is uniform in {0, 1} n and is independent from the values in the history.And it would create t distinct random V values, i.e., Case 2: (Y, T, ←, ) is created.In this case, Y is uniform in {0, 1} n \{Y 1 , . . ., Y t }, and s distinct random V values are created, i.e., 9 After the query LTGen RO,P (M ) is made, (a) if (M, U X, ) ∈ ROSet, then since we assumed (U, V, , n 1 ) existed before, the game just creates the record (Y, T, →, n 2 ) with n 2 = n M ; and (b) if (M, U X, ) / ∈ ROSet, then (M, U V, n M ) is created for some U V .Then, since we assumed that (U, V, , n 1 ) existed before, the game creates the record (Y, T, →, n 2 ) with n 2 = n M + 1.
For each i ∈ {1, . . ., s}, when Case 3: (Y, T, →, ) is created.This record has to be created during processing a query LTGen RO,P (M ).And right before (Y, T, →, ) ∈ PSet 2 holds, there exists ((M , U X , ), (U , V , , )) ∈ ROSet × PSet 1 such that Y = V ⊕ X .By the definition of S 4 , right before (Y, T, →, ) ∈ PSet 2 holds, it holds (U , V , →, ) ∈ PSet 1 .Then V is uniform in a set which we denote V for convenience, which satisfies |V| ≥ 2 n − q.This means Y = V ⊕ X is uniform in the set X ⊕ V.In a similar vein to Case 2, we have Pr[L i (V ) = 1 | (a 1 , . . ., a i−1 )] ≤ 2q 2 n .Conditioned on ¬(B-2), for any certain value X i , the number of RO queries (M, U X, ) such that X = X i is at most 2. By this and the above, we have Via a Chernoff bound-based argument that follows [CLL + 18, PS15], it can be proved The proof is deferred to Appendix B for cleanness.Eq. ( 12) plus ( 13) and ( 14) indicate the constant CON = 20 √ q + 16nq 2 2 n constitutes an upper bound on µ V for any V , i.e.

Unforgeability Unless Abortion
If the action LTGen RO,P (M ) → T happens, then all the subsequent non-trivial verification queries LVrfy RO,P (M, T ) have to satisfy T = T .Since P 1 and P 2 are two permutations, it always holds LVrfy RO,P (M, T ) = 1.Therefore, we could concentrate on the verification queries LVrfy RO,P (M, T ) for which LTGen RO,P (M ) was never made.To this end, we define an event Chain capturing the aforementioned "chains of records": at any time during the execution, there exist three query-records (M, U X, n 1 ) ∈ ROSet, (U, V, d 1 , n 2 ) ∈ PSet 1 , and (Y, T, d 2 , n 3 ) ∈ PSet 2 such that Y = V ⊕ X, and M / ∈ TGened.To complete the analysis, we derive an upper bound on Pr[Chain | ¬(A RO,LTGen RO,P ,LVrfy RO,P aborts)].For this, note that the presence of such a chain consists of 5 cases: In addition, after the involved action is completed, it remains M / ∈ TGened.Below we analyze them in turn.
For (C-1): for each such record (M, U X, ), right before it's added to ROSet, both U and X are uniform.There are at most q 2 "targets" ((U , V , , ), (Y , T , , )).Therefore, For (C-2): from the code it's easy to see (U, V, →, ) is created during a LTGen RO,P (M ) query.It has to be • M = M , otherwise the resulted record chain does not satisfy M / ∈ TGened, and Conditioned on ¬(B-3), the number of choices for the query (M, U X) that has a corresponding M satisfying such requirements is ≤ √ q.For each such (M, U X), right before (U, V, →, ) is added, V is uniform in ≥ 2 n − q values; and there are ≤ q queries (Y, T, , ).Therefore, Pr[U = X ⊕ Y for (Y, T, , ) ∈ PSet 2 ] ≤ q 2 n −q , and For (C-3): for each query P −1 1 (V ), the number of pairs ((M, U X, ), (Y, T, , )) such that V = X ⊕ Y is at most β.This means the number of "target" values U does not exceed β either.For each P −1 1 (V ) → U , U is uniform in ≥ 2 n − q values; taking a union bound over the ≤ q queries to P −1 For (C-4): From the code, (Y, T, →, ) must be created during processing a LTGen RO,P (M ) query.It has to be M = M otherwise M ∈ TGened.More importantly, right before (Y, T, →, ) is created, there exist (M, U X, ), (M , U X , ), (U, V, , ), (U , V , , ) such that X ⊕ V = X ⊕ V and M / ∈ TGened.Moreover, it has to be U = U : otherwise U = U ⇒ X = X by ¬(B-1) and X ⊕ V = X ⊕ V is not possible.
Vrfy H,E K (M, T ): proceeds in three steps: 1. Forward Computation: computes U X = H(M ) and Under the "unbounded leakage" assumption, the provable bound 2 2n/3 /n is tight, as we will justify in Appendix C. But we are not aware of any attack with low data and feasible time complexity, i.e., any attack cheaper than the naïve side-channel key recovery.Deeper characterization of the concrete side-channel security is left for future work.

Mode RHM and Its Security
Formally, the mode RHM H,E along with the leakages is defined in Fig. 4. In the ideal cipher model, the MAL2 security of RHM is up to 2 n /n queries.
Theorem 2. Let IC : {0, 1} n × {0, 1} n → {0, 1} n be an ideal cipher and RO : {0, 1} * → {0, 1} 2n be a random oracle, and assume q ≤ 2 n /2.Then with the leakages L TGen and L Vrfy (Fig. 4), for any (q, t)-adversary A against the MAL2 security of RHM RO,IC , it holds Proof.Similarly to Theorem 1, the optimal security of RHM cannot be obtained via a modular approach, as the TBC E(tw, X) = E E K (tw) (X) is not BBB secure [Min09].Therefore, we divide our analysis into an overview and two steps as below.

Proof Overview
The proof flow is very similar to Theorem 1. Concretely, we first idealize the scheme RHM RO,IC K as RHM RO,P,IC , in which the first call to IC K is replaced by a random permutation P that is never queried by the adversary A. The difference between Adv MAL2 RHM RO,IC (A) and Adv MAL2  RHM RO,P,IC (A) is reduced to the PRP security of the ideal cipher IC.Unless the adversary hits the key K in its q queries to IC, the two systems (IC K , IC) and (P, IC) are indistinguishable.For each adversarial query to IC, the probability of such a "hit" is 1/2 n .Summing over the q adversarial queries, we reach We then focus on bounding Adv MAL2 RHM RO,P,IC (A).We also describe the security game using pseudocode in Fig. 5.The sets, the query-records, the auxiliary variables (dir and qnum), and the abort mechanism are all similar to Fig. 3.At any time during the interaction, given the internal sets, we define three auxiliary sets The remaining two steps also resemble subsection 3.1: first, we bound the probability that A RO,LTGen RO,P,IC ,LVrfy RO,P,IC aborts; second, we show that LVrfy always returns 0 if A RO,LTGen RO,P,IC ,LVrfy RO,P,IC doesn't abort.

Probability of Abortion
Consider (C-1) first.Note that the set PSet defines a one-to-one correspondence.This means for each (V, X, T, , ) ∈ ICSet, the number of U such that (U, V, ) ∈ PSet is at most 1.Therefore, the number of "target" pairs ((U, V, ), (V, X, T, , )) is ≤ |ICSet| ≤ q, and Pr (C-2) essentially states that RO-collisions occur.So Pr[(C-2)] ≤ q 2 2 2n+1 .(C-3) essentially states that there exist n queries (M 1 , U 1 X 1 , ), . . ., (M n , U n X n , ) such that U 1 = . . .= U n .The number of choices for such n queries is q n ; for any of them, the probability to have Similarly for (C-4) by symmetry: Pr[(C-4)] ≤ 2q 2 n .For (C-5), consider such a query IC(V, X).We distinguish two cases.In the first case, IC(V, X) is made during processing a tag query LTGen RO,P,IC (M ).Let U X = RO(M ).Then conditioned on ¬(C-2), for any M = M and U X = RO(M ) such that X = X, it necessarily holds U = U .Since PSet defines a bijection, this means V = P(U ) = V , i.e., in this case, (C-5) would not be triggered.
In the second case, IC(V, X) is made by the adversary.Then, conditioned on ¬(C-4), the number of (M, U ) such that (M, U X, ) ∈ ROSet is at most n−1.For each such U , if there exists V such that (U, V, ) ∈ PSet, then since V is never leaked to A, conditioned on the transcript of queries and answers (including leakages) obtained by A, V remains uniform in at least 2 n − q possibilities (since it does not equal V for any (V , X , T , , ) ∈ ICSet that is known to A).Therefore, for each query IC(V, X), the probability of abortion is at most n−1 2 n −q .Denote by q 1 the number of such forward queries to IC, then For (C-6), consider such a query IC −1 (V, T ).As PSet defines a bijection, the number of corresponding U is at most 1.Then, conditioned on ¬(C-3), the number of (M, X ) such that (M, U X , ) ∈ ROSet is at most n − 1.For each such (M, U X , ), since the newly sampled 2 n −q .By these, denote by q 2 the number of such backward queries to IC −1 , then Finally consider (C-7).Consider the i-th such query P(U i ) that samples V i .For clearness, write Pr ∃T and j ∈ {1, . . ., α i } : Conditioned on ¬(C-2) and ¬(C-4), in the summation q i=1 αi j=1 IC[X i,j ] , each specific X appears at most n − 1 times.Therefore, there exists a set X of n-bit values such that On the other hand, for any such set X we have X∈X IC[X i,j ] ≤ ICSet ≤ q.Therefore, By all the above, when q ≤ 2 n /2, Pr[A RO,LTGen RO,P,IC ,LVrfy RO,P,IC aborts] ≤ q 2 2 2n +

Unforgeability Unless Abortion
Consider the Chain event: at any time during the interaction, there exists three queryrecords (M, U X, n 1 ), (U, V, n 2 ), and (V, X, T, dir, n 3 ) such that M / ∈ TGened.

Performance Evaluations
In this section we report our implementation results.The blockcipher in our modes is naturally instantiated with AES128.More specifically, we follow [GR17] (concretely, the "KHL method") and implement the masked AES with various orders in C code. 10 On the other hand, the hash functions are instantiated with the SHA3 variant built upon 16-round Keccak-f [400] implementations of [DEM + 19] rather than the standard Keccak-f [1600] to enable a fair comparison to IsapMacKA.We'll refer to these instances of LRWHM and RHM by AES-SHA3-LRWHM and AES-SHA3-RHM respectively.According to the  For the sake of efficiency, we pre-expand the MAC keys and store them in memory under a shared form.That is, for AES-SHA3-LRWHM we never execute the key schedule (since it only calls AES with fixed keys), and for AES-SHA3-RHM, we only need to execute the (masked) key schedule once (due to the rekeying of the second AES-call).Based on all these, we benchmark the performance of our schemes on the 32-bit ARM Cortex-M3 processor, as depicted in Fig. 6 and 7. Unsurprisingly, AES-SHA3-RHM is slightly more costly than AES-SHA3-LRWHM due to the additional (masked) AES key schedule.
For comparison, we also consider the IsapMacK and IsapMacKA implementations in [DEM + 19], the MAC functions underlying their Keccak-based AE variants Isap-K-128 and Isap-K-128A resp.The main difference between Isap-K-128 and Isap-K-128A lies in the rate-1 duplex-based FILTG function: Isap-K-128 invokes 12-round Keccak-f [400] for more reliable security, while Isap-K-128A invokes 1-round Keccak-f [400] for better efficiency [DEM + 19].We also implement a simplified AES-CBC variant built upon our masked AES with pre-expanded secret keys, in order to obtain the performance upper bound of CBC (see Appendix D for the pseudocode of this CBC variant).Note that here we focus on performance comparison, and thus we don't care about their (different) security bounds.To ease understanding, we picture all the evaluation results in Figs. 6 and 7. Our source code has been submitted as the separate supplementary material (an zip archive).
Among them, Fig 6 illustrates the impacts of message size on the latency (in terms of the number of cycles for processing), in which the X axis represents the message size, and the two sub-figures depict clock cycles of AES-SHA3-LRWHM, AES-SHA3-RHM, IsapMacK, IsapMacKA and AES-CBC on the Y axis with number of shares d = 2 and 5 respectively for masked AES.The first observation is that AES-SHA3-LRWHM and AES-SHA3-RHM outperform IsapMacK when up to 5 shares are used, and are comparable to IsapMacKA when d = 5 (when d = 2, IsapMacKA is slightly inferior).Such gains stem from the relatively low performance of the rate-1 duplex in IsapMacK and IsapMacKA.Though, as mentioned, the rate-1 duplex in IsapMacKA is extremely light, and thus ours don't have much advantage.Another interesting observation is that the latency of AES-CBC greatly increases with the message size-even forming "stairs", which is in sharp contrast to the smooth curves of our algorithms (and the two IsapMacs): this is because in our algorithms (and IsapMacs), every additional message block only induces (roughly) a call to the efficient (unprotected) Keccak-f [400] permutation, the cost of which is negligible compared to the one more masked AES-call in AES-CBC.Due to this, our algorithms outperform AES-CBC as long as the message contains more than 120 and 50 bytes for masking order d = 2 and d = 5 resp., and the gains further increase with the size.
On the other hand, Fig 7 takes the X axis for the number of shares and reflects the impacts of side-channel protection (in terms of the number of shares) on the latency.The left sub-figure shows cycles of aforementioned MACs on the Y axis when processing messages with only 16 bytes, while the right shows those for longer messages with 160 bytes.It's thus natural to see that the performance gains of AES-SHA3-LRWHM, AES-SHA3-RHM over IsapMacs decrease with the masking order d, since IsapMacs rely on the masking order-independent rate-1 duplex.Still, with less than 10-share masked AES (which already corresponds to a significantly higher security level than actually deployed), our schemes are more efficient than IsapMacK; with less than 4-share masked AES, our schemes outperform IsapMacKA.While Fig 7 (left) seems to indicate AES-CBC is better, we stress that the comparison is made w.r.t.very short messages of only a single block.For such short inputs, our deficiency is expected, since our schemes make 2 AES-calls while the CBC variant makes only 1 AES-call.But as long as the message turns longer, e.g., 160 bytes in Fig 7 (right), our algorithms achieve performance gains that increase significantly with the number of shares.
In summary, AES-SHA3-LRWHM and AES-SHA3-RHM outperform AES-CBC as long as the message consists of more than 120 and 50 bytes for masking order d = 2 and d = 5 respectively, and the performance gains increase with both the message size and the strength of side-channel protection.They also outperform the IsapMacK and IsapMacKA implementations with up to probing secure orders 9 and 3 respectively (corresponding to 10-share and 4-share masked AES).We remark again that the goals of our (masked) schemes and IsapMacs are quite different-as discussed in the Introduction.
Note that we took AES-CBC as a representative of the "fully protected" classical MACs, and omit the other such as HMAC, KMAC, Wegman-Carter, and ZMAC [IMPS17].As discussed in the Introduction, all of them consume ≥ heavy protected executions (of permutations, field multiplications, etc), which is similar to AES-CBC.Therefore, their performances are expected to be similar to AES-CBC, i.e., the latency due to the masked primitives increases linearly with the message size.Also, compared to AES, a more significant performance loss is expected from protecting SHA2, due to the relatively high complexity (generally O(d 2 log k) for register size k) of higher-order conversion from Boolean to arithmetic masking [CGTV15,CGV14].For MACs using multiplication-based universal hash functions, the latency could be decreased via parallel implementations (though may be a bit difficult for the ARM settings), but the energy consumption remains remarkable.In summary, for the protected standards, the performance of AES-CBC is expected to be among the best, thus constituting a reasonable baseline.

Concluding Remarks
We propose two MA modes LRWHM and RHM that for the first time achieve provable beyond-birthday security when the protected blockciphers are leakage secure, but most other intermediate values during tag and verification computations are leaked.The modes can be easily deployed.We benchmark performances for their instances, which exhibits advantages over existing schemes or standards.then we could have a satisfying bound for Adv MAL2 SHM (A).Compared to Eq. ( 20), the suffix 2 in MAL1 indicates that the number of involved leaking oracles is one, and the verification oracle is the non-leaking Vrfy K instead of the leaky LVrfy K .
The idea of summing is not new: it has appeared in many wide-pipe MACs.Particularly, a recent work of Datta et al. showed that the AXU-hash-based variant of SHM is a BBB secure PRF [DDNP18].Our results thus extend [DDNP18] and show that this paradigm remains highly secure to some extent (i.e., unpredictable) even if internal values are leaked completely.

A.1 Hashing Assumptions
In detail, in this appendix section we use a slightly different model for the hash, i.e., we model the hash as H(s) a family of functions that are indexed by a public seed s.In order to cinch the security proof for SHM, we need H s to be collision resistant.We also need another collision security notion which we term "semi-collision-resistance (SCR)".The two advantages are defined as follows.

A.2 Security of SHM
The formal security claim is as follows.
During the sampling process, u i = 1 implies v i = 1, so that for any r, This implies Eq. (24).

Figure 1 :
Figure 1: Our new MA modes.The hash H is modeled as a random oracle RO.Components in the red dashed squares are protected to be side-channel secure (modeled as "leak free"): these include all the 4 blockcipher calls and the red bold wire in the right part.(left) LRW1-based hash-then-MAC LRWHM; (right) rekeying hash-then-MAC RHM.

( 2 )Figure 3 :
Figure 3: Security game capturing the interaction between A and the idealized tag generation and verification oracles of LRWHM.

Figure 4 :
Figure 4: The description of RHM H,E MA mode.