Improvement on a Masked White-box Cryptographic Implementation - Updated version -

. White-box cryptography is a software technique to protect secret keys of cryptographic algorithms from attackers who have access to memory. By adapting techniques of diﬀerential power analysis to computation traces consisting of runtime information, Diﬀerential Computation Analysis (DCA) has recovered the secret keys from white-box cryptographic implementations. In order to thwart DCA, a masked white-box implementation was suggested. It was a customized masking technique that randomizes all the values in the lookup tables with diﬀerent masks. However, the round output was only permuted by byte encodings, not protected by masking. This is the main reason behind the success of DCA variants on the masked white-box implementation. In this paper, we improve the masked white-box cryptography in such a way to protect against DCA variants by obfuscating the round output with random masks. Speciﬁcally, we introduce a white-box AES (WB-AES) implementation applying the masking technique to the key-dependent intermediate value and the several outer-round outputs computed by partial bits of the key. Our analysis and experimental results show that the proposed WB-AES can protect against DCA variants including DCA with a 2-byte key guess, collision, and bucketing attacks. This work requires approximately 3.7 times the table size and 0.7 times the number of lookups compared to the previous masked WB-AES.


Introduction
One of the most important issues in software implementations of cryptographic algorithms is to protect the secret key from various threats.White-box cryptography [3,17,20] is a software technique to protect the key from white-box attackers who can access and modify all resources in the device.In general, whitebox cryptography precomputes a series of lookup tables for all input values for each operation and obfuscates the tables with linear and nonlinear transformations (i.e.encoding) to prevent the key from being analyzed [13,14].Given the key-instantiated lookup tables above, actual encryption or decryption consists of table lookups that replace most of operations.
It is not possible to extract the key from white-box cryptographic implementations simply by observing the intermediate values in memory.Previously, the key extraction from white-box cryptography was largely dependent on cryptanalysis [5,18,25,29,30,33] that requires detailed knowledge of the target implementations.Recent attacks, on the other hand, have adapted statistical techniques [9,32] of Differential Power Analysis (DPA) [21], and thus an indepth understanding of the target implementation is not necessary.In particular, Differential Computation Analysis (DCA) [9] used Correlation Power Analysis (CPA) [11], which is a DPA variant, as a subroutine to calculate Pearson's correlation coefficient, but it improved the efficiency by using computation traces (also known as software execution traces) consisting of noise-free information such as memory accesses.
One of the most well-known techniques protecting against statistical sidechannel analysis like CPA is masking [1,7,15,28], which randomizes every intermediate values for each execution of encryption.In [23], a customized version of masking on a white-box AES (WB-AES) implementation (with a 128-bit key) was proposed to prevent DCA.Unlike the existing masking, it used different masks for each value of the intermediate variable and generated a set of masked lookup tables; thus, there is no need to mask the entire tables every time an encryption operation is performed.However, it has been broken by two types of DCA variants.The first type is to extend the space of the hypothetical key to 16 bits [31] by which each subbyte of the first round output can be analyzed with a 2-byte key guess.The second type includes collision-based attacks using the bijective property of the encoding.A collision-based DCA attack [31] is similar to the 2-byte key guess, but the analysis method of computation traces is different.A bucketing attack [34], as a collision attack, can be also successful with chosenplaintext sets, in which the plaintexts are divided into two set based on the predefined four bits of a hypothetical round output.The common cause of these vulnerabilities is that an attacker who correctly guessed one or two subkeys can predict the input value to the encoding of a round output using a set of subbytes in the plaintext.
In this paper, we improve masked WB-AES in such a way to protect against these vulnerabilities.The key point is to apply masking not only to the intermediate values but also to the round outputs computed with less than 128 bits of the key.Our evaluation shows that the proposed method provides protection against DCA and its variants, and the additional cost is a table size that is 3.7 times larger than the previous masked WB-AES.The rest of the paper is organized as follows.Section 2 briefly reviews the past design of masked WB-AES, and Section 3 explains its vulnerabilities to DCA-variant attacks.Afterwards, Section 4 presents a secure design of masked WB-AES, and Section 5 evaluates its security and performance.Finally, Section 6 concludes this paper and provides future work.

Past Design of Masked WB-AES
This section provides a brief overview of the past design of masked WB-AES with a 128-bit key [23].To do so, the principle of a customized masking technique on white-box cryptography is explained based on Chow's WB-AES [13].By pushing the initial AddRoundKeys into the first round, the AES-128 algorithm can be expressed as follows, with two round keys involved in the final round: state ← plaintext for r = 1 where MC i∈{0,1,2,3} is the i-th column vector of MC.We denote each term of the right-hand side by y 0 , y 1 , y 2 , and y 3 , respectively.The lookup table of de-composed MixColumns is then defined by T y i as follows: The first WB-AES designed by  1b) replaces the 32×32 linear transformations applied to the TypeII output with 8×8 linear transformations, and the TypeIV III table recombines the TypeIII output for computing the round output.By doing so, an input to the next round TypeII becomes 8 bits in length thereby preventing the entire table size from becoming large.Finally, a lookup table with an input decoding for T 10 in the final round was named TypeV (Fig. 1d).Note that TypeI for the external encoding is not considered in this paper for the interoperability.Fig. 1 briefly describes TypeII -TypeV.
Unfortunately, linear transformation and nibble encoding were known to leave a correlation with intermediate values in the resulting values [2,22]; consequently, the correlation in the lookup values became one of the vulnerabilities in whitebox cryptography [9,32].For this reason, there were several approaches to preventing DPA and DPA variants on white-box cryptography.For example, applying masking [12], a standard countermeasure to DPA, was investigated [6,8].However, masking is vulnerable to higher-order DPA attacks, and this is also the case with masking applied to white-box cryptography [8].Another method is to use an additional set of lookup tables that store bits completely inverted from a given table set [24].However, this method is only effective if the given table set shows a high correlation, and thus the protection is not always guaranteed.More importantly, these countermeasures require a run-time random source in order to generate masks uniformly at random and select one of the two lookup table sets, respectively.
To prevent the key leakage by statistical analysis without using run-time random number generators, two techniques were incorporated into Chow's WB-AES [23].First, each byte at the right-hand side of Equation (1) was concealed by masks randomly picked for each value of x i∈{0,1,2,3} .It is a customized masking technique that differs from the existing masking technique that uses the same mask value.Therefore, the newly defined TypeII-M consists of the masked T y i∈{0,1,2,3}  values and the mask values as shown in Fig. 2. Next, TypeIV IIA combines the masked T y i value, and TypeIV IIB computes the round output by XORing the output value of TypeIV IIA and the mask used.This is the outline of CASE 1 [27] that provides the basic requirements of the past design of masked WB-AES, and Fig. 3  Fig.2: TypeII-M in the past design of masked WB-AES.
Second, the nibble encodings were replaced by byte encodings for some inner round outputs depending on the security requirement (CASE 2 or 3).This is because the mask completely disappears in the round output after the masked outputs of MixColumns are XORed.However, the next section will review that the previous version of masked WB-AES is not effective to DCA-variant attacks.

Vulnerability to DCA variants
The customized implementation of masked WB-AES explained in the previous section was shown to be resistant to DCA attacks using one-byte key guess [23].However, it was known to be vulnerable to DCA variants, including a 2-byte key guess and collision-based attacks.As pointed out previously, these attacks, unlike cryptanalysis, can be carried out without detailed information on the internal design of white-box cryptography.Before going on, we note that Higher-order DCA [8] does not work on the customized version of the masked implementation that applies a different random  mask for each value of the target intermediate variable.In the case of Linear Decoding Analysis (LDA) [19], the key is analyzed by solving the system of linear equations that the matrix-unknown coefficient multiplication becomes the hypothetical intermediate value, where the matrix consists of intermediate values obtained from the corresponding computation traces.If the system is solvable for a hypothetical key, it is probably the correct key.Otherwise, if no solution is found for every hypothetical key, the attack fails.However, LDA is not allowed in masked WB-AES because the matrix is randomized due to the mask which makes the system unsolvable.

DCA
Originally, CPA using Pearson's correlation coefficient is one of the power analysis methods to recover the key based on the fact that the power consumption is proportional or inversely proportional to the Hamming weight (HW) of the data currently being processed.Let denote N power traces by V 1..N [1..κ], and a hypothetical key by k, where κ is the number of sample points.For K different hypothetical keys, E n,k (1 ≤ n ≤ N , 0 ≤ k < K) implies the power estimate in the n-th trace.Then, the estimator r at the j-th sample point is defined as , where E k and V [j] are means of E k and V [j], respectively [26].The hypothetical key that produces the highest peak in the correlation plot is supposed to be the correct key.This CPA attack was adapted to break white-box cryptography because the linear transformation and the nibble encoding are unable to eliminate correlation [2,22].In the repository of public white-box cryptographic implementations and DCA attacks [16], DCA also adapted CPA using Daredevil [10], a software tool to perform CPA.The difference from the classical power analysis is that DCA improved the efficiency of CPA by collecting noise-free computation traces instead of power traces collected by an oscilloscope.In average, DCA recovered 14.3 out of 16 subkeys from Chow's WB-AES using only 200 computation traces, whereas no key was recovered from masked WB-AES [23].However, the CASE 1 implementation of the previous masked WB-AES cannot prevent DCA with a 2-byte key guess [31] exploiting the round output that is not masked, but only protected by linear transformations and nibble encodings.
In order to reduce the key search space for a subbyte of the first round output from 2 32 to 2 16 , two bytes in a column of the plaintext state were fixed.Let denote (p 0 , p 1 , p 2 , p 3 ) the first column of the plaintext state.By fixing p 0 and p 1 to 0, the first byte of the round output can be written as ) is correlated to s which is in turn correlated to its encoded value.

Collision Attack
Similarly, a collision-based DCA attack [31] can be also mounted with the 2 16  key space by fixing two input bytes.This is based on the fact that if a hypothetical subbyte of the round output collides for a pair of inputs, so does its encoded value in the computation trace.For each pair of inputs and their computation traces, an attacker compares the values of each sample position in the two traces and writes 1 in a collision computation trace (CCT) if the two values are equal; otherwise writes 0. To test for collision between CCT and attacker's hypothetical values, the collision prediction is composed of 0 and 1 that are assigned in the same way by comparing two hypothetical subbytes of the round outputs computed by each pair of the inputs and a hypothetical key.Thus, there is a perfect match between the target sample position in the CCT and the collision prediction for the correct hypothetical key.Contrary to a 2-byte key guess, the collision-based DCA attack is valid even if the byte encoding is used on the round output.Here, we do not take into account the improved mutual information analysis [31] because this is similar to the collision and succeeds if and only if the collision attack succeeds.

Bucketing Attack
Extended statistical bucketing analysis [34], as a variant of the collision attack, is based on the fact that if two correct hypothetical values computed by a pair of plaintexts do not collide, their corresponding encoded values should not collide as well.Bucketing Computational Analysis (BCA) applied this principle to whitebox cryptography using computation traces.For example, an attacker can divide the first subbyte of plaintexts into two sets with two distinct values according to the lower four bits of the S-box output.By fixing the remaining 15 bytes of the plaintext, the attacker can be convinced that the two sets of plaintexts produce disjoint sets of the lower four bits of the first subbyte in the first round output.This attack works on the CASE 1 of the previous masked WB-AES because the round output is not masked and protected by the nibble encoding.Thus, this attacker can confirm or deny a hypothetical key by observing whether or not the first subbyte in the round output is disjoint depending on the chosen-plaintext set.Zero Difference Enumeration (ZDE) [4] may be considered similar to BCA.ZDE works by selecting special pairs of plaintexts that allow the significant number of intermediate values computed by the correct hypothetical key to be identical.However, this is known to be inefficient taking 500 × 2 18 traces to recover a subkey of AES, and also the selected pairs of plaintexts are unable to make identical intermediate values in masked WB-AES.

New Design of Masked WB-AES
DCA-variant attacks on the previous masked WB-AES analyzed the round output in which the masks are removed.In this section, we propose a new design of masked WB-AES in order to protect each byte of the round output before encoding.To do so, a subbyte of the round output computed by partial bits of the key is masked, and the input decoding phase of the next round is modified to unmask it.The following explains how to design the lookup tables depending on the presence or absence of masking on the input and output, and how to connect other tables.
TypeII MO (Masked Out).This adds the random masks on the T y i outputs and encodes the masked values and the masks used.This is used in the first round because each subbyte of the first round output only involves 32 bits of the key.Note that all 128 bits of the key affect each subbyte of the round output after the output value of the second MixColumns multiplication is XORed.For the same reason, this is also used in the eighth round because each subbyte of the ninth round input needs to be protected by masking, as only six bytes of the key are associated with it in terms of decryption that goes back from ciphertexts.
The difference from TypeII-M used in the past design of masked WB-AES is the encoding on masks.As shown in Fig. 2 and Fig. 3, the masked T y i outputs were previously unmasked before the TypeIII lookup, and thus the intermediate value and the mask shared the same matrix for the linear transformation in order to take advantage of the distributive property of matrix multiplication over XOR.Now, the mask is not immediately combined with the masked T y i outputs, but with the other mask values to provide the masked round output.Fig. 4a shows that 8×8 linear transformations are applied to the mask in TypeII MO because the masks are joined together between masks.This is because a mask is a random value generated in a uniform distribution and independent of the key, so there is no reason to apply a linear transformation of large diffusion effects.For this reason, the masks do not require the process of replacing linear transformations by TypeIII and TypeIV III, thereby reducing the overall table size and the number of lookups.Let denote TypeIV IIM the TypeIV table used to combine the mask connected by dotted lines in Fig. 4a.
On the other hand, the TypeIV II table combines only the masked T y i outputs that keep the round output secure as shown in Fig. 5a.After computing the masked round output above, TypeIII and TypeIV III replace the 32×32 linear transformation with 8×8 linear transformations like in the case of Chow's WB-AES.Then we have two 4×4 state matrices, vs (value state) and ms (mask state), where vs is the masked round output and ms is the mask value.This lookup sequence is illustrated in Fig. 7a.
TypeII MIMO (Masked In Masked Out).Because the first round output is masked, the TypeII table in the second round takes each byte of vs and ms as input.Then an input to T 2 can be computed by decoding and XORing those two bytes.In the second round, masking is again applied to the T y i outputs because not all key bits affect each intermediate value before combining the outputs of the second round MixColumns.Here, we call it TypeII MIMO, which takes the masked input and provides the masked T y i outputs.TypeII MIMO is again divided into two types, depending on the linear transformation applied to the mask.If the masked round output is unmasked before looking up the TypeIII table, like in the case of the past design of masked WB-AES shown in Fig. 3a, a 32×32 linear transformation is applied.Otherwise, if the masked round output and the mask values are separated into vs and ms, and passed to the next round, an 8×8 linear transformation is applied.
In the second round, unmasking is completed only after the XOR operations between the masked T y i outputs are finished.For this reason, a 32×32 linear transformation is applied to the mask in the second round as plotted in Fig. 4b, and the unmasking is conducted with the TypeIV tables as shown in Fig. 5b.The overall sequence of table lookups in the second round is shown in Fig. 7b.
In addition, each subbyte of the ninth round output needs to be masked.This is because if the two subkeys hidden in T 10 of the final round are correctly guessed by the attacker, the hypothetical subbyte of the ninth round output computed inversely from the ciphertext will correlate with the corresponding subbyte of the encoded ninth round output.Thus, the masked T y i outputs and the masks are XORed and passed separately to the input of TypeV MI in the final round as shown in Fig. 5c and Fig. 7c.By abuse of notation, we continue to use the same names for TypeII MIMO and TypeIV IIM in the second and the ninth rounds for the simplicity although they differ in the linear transformation applied to the mask and the number of copies of the TypeIV table, respectively.The size of each table and the number of lookups are analyzed in the next section.
Remark.We note that masks should be generated uniformly at random.For each different pair of a masked value and a mask of the round inputs, different seeds of generating masks on T y i should be applied to prevent collision-based attacks.
TypeII .The TypeII table (Fig. 1a) for the rest of the inner rounds (third to seventh) is used in the same way as Chow's WB-AES, since masking is not applied to a byte computed by the entire key.The replacement of linear transformations are also processed in the same way with TypeIII and TypeIV III as depicted in Fig. 7d.
TypeV MI (Masked In).For the final round, the TypeV MI table is generated by decoding and XORing each byte of vs and ms to make an input byte to T 10 as shown in Fig. 6.Without the external encoding, each TypeV MI output becomes a subbyte of the ciphertext (Fig. 7e).

Evaluation
We evaluate the implementation of the proposed method in terms of security and performance.Security analysis includes the evaluation of protection against   of DCA and DCA variants described in Section 3, and performance analysis provides the table size and the number of lookups.To do so, we generated the lookup tables according to the proposed design of masked WB-AES, and conducted various experiments.First, the correlation between the TypeII MO value and the hypothetical value of the SubBytes output in the first round is analyzed with the Walsh transform.In addition, the correlation between the masked round output and the hypothetical round output computed by a 2-byte key guess is also analyzed.Next, a perfect match for a collision attack is tested on the masked round output.Finally, we check if the chosen plaintexts of the bucketing attacker can make disjoint sets on the masked round output when the hypothetical key is correct.

Security Analysis
We analyze and demonstrate hereafter the protection against the vulnerabilities explained in Section 3. Before going on, we first show protection against DCA on the TypeII MO outputs in the first round.In fact, the masked T y i output in the first round is the same as the past version of masked WB-AES [23] proven secure against DCA.Consider a DCA attacker the target values by accessing memory while the encryption is performed.This attacker learns the intermediate values from the computation traces, and runs a CPA attack as a subroutine to calculate Pearson's correlation coefficient with the hypothetical values.Here, the computation trace serves to provide noise-free information of intermediate values.If one can directly observe these noise-free values, the computation trace is not required, and the Walsh transform consisting of simple operations can be an alternative to CPA for calculating the correlation [23,32].For this reason, the Walsh transform defined below [32] is used here because we can directly access noise-free intermediate values from the lookup table.
Definition 1.Let x = x 1 , . .., x n , ω = ω 1 , . .., ω n be elements of {0, 1} n and x•ω = x 1 ω 1 ⊕. ..⊕x n ω n .Let f (x) be a Boolean function of n variables.Then the Walsh transform of the function f (x) is a real valued function over {0, 1} n that can be defined as In Definition 1, let x be a hypothetical intermediate value to be analyzed and ω be an operand of the inner product with the HW 1 selecting a specific bit of x.
The reason why the HW of ω is 1 is that it is difficult to analyze the key by HW or multi-bit based correlation analysis due to the encodings, whereas single-bit analysis is successful.On the other hand, f (x) represents the real lookup values and provides the noise-free intermediate values like the computation trace.To indicate a particular bit of the n-bit lookup value, f (x) is represented as n Boolean functions.In Definition 2, W f i = 0 means no correlation, whereas a large absolute value of W f i means that there is a large correlation at the i-th bit of f (x) and x • ω.Using this principle, the following shows that the proposed implementation can protect against DCA.
For the first subbyte p ∈ {0, 1} 8 of the plaintext and a hypothetical subkey k, the correlation between each bit of the hypothetical S-box output and its corresponding TypeII MO values can be quantified by where f i (p) is the i-th bit of the left 32-bit value of the TypeII MO output depicted in Fig. 4a.Because this equation tests all possible values of p, and we know f i (p), the correlation can be analyzed accurately as if it is analyzed by a large number of random plaintexts in DCA.Fig. 8 is the result of the Walsh transforms for the first subkey, and shows that the key leakage did not occur when each bit of the SubBytes output was analyzed.A DCA attack using 10,000 computation traces also failed as shown in Table 1.
Table 1: DCA ranking for the proposed method of masked WB-AES when conducting mono-bit CPA on the SubBytes output in the first round with 10,000 computation traces.
```````T Second, a DCA attack with a 2-byte key guess can be protected.As explained previously, the first subbyte of the round output without masking can be represented by a function of p 2 and p 3 as if the attacker fixes the first two bytes to zero in the first column of the plaintext state.In the case of the masked round output, this can be written by abuse of notation as ŝ(p where c r is a fixed mask for c, and r 2 and r 3 are random bijections for choosing masks uniformly at random.Here, we note that r i (p) does not mean that a  This can be rewritten as shown below by substituting the correct subkeys for k 0 2,2 and k 0 3,3 : Then, the first subbyte of the first round output obtained from TypeIV II can be expressed by (ŝ(p 2 , p 3 )), where is an encoding of the round output.Let's assume that the attacker already knows the subkey k 0 2,2 = 0x AA, and the target hypothetical value is given by h(p 2 , p 3 , k) as follows: where k is a hypothetical subkey.Then, the correlation between (•) and h(•) can be quantified by where i (•) is the i-th bit of (•).Here, we know that ŝ(•) will no longer correlate to h(•) if r(p 2 , p 3 ) generates a random byte with a uniform distribution.Our experimental result shows that DCA with a 2-byte guess cannot succeed even if the attacker is able to correctly guess the remaining subkey k = 0x FF as shown in Fig. 9.In other words, this means that ŝ(•) is not correlated to h(, , k * ) due to the random masks, where k * denotes the correct subkey.
Third, the collision attack is also not allowed because the perfect match between the target sample in the CCT and the hypothetical value computed from the correct hypothetical key will be violated in the masked round output.Let us demonstrate the perfect collision without masking on the round output.To do so, we collected the following set of pairs:  where = |I v |.Let Z * denote a vector consisting of identical constants.The perfect match for the successful collision attack requires , and the cosine similarity between Z * and Z v should be 1 because cos(0 • ) = 1.Indeed, Fig. 10a shows that the correct subkey shows the cosine similarity 1 when the round output is not masked.This implies the success of the collision attack.
To evaluate the effect of masking the round output, we generated the vector Z v as follows: Then, the cosine similarity between Z * and Z v for the correct subkey looks random like other wrong hypothetical subkeys as shown in Fig. 10b.This implies that the masked round output protects against the collision attack.Finally, the bucketing attack can be also protected.Before going on, we begin with a demonstration of how it works on the CASE 1 in the past implementation of masked WB-AES.For two bucket nibbles d 0 , d 1 ∈ {0, 1} 4 such that d 0 = d 1 , a bucketing attacker defines two sets: where i = {0, 1}, and k is a hypothetical key.Let [ 0 0 p 0 ] T be the first column of the plaintext state.Then, the lower four bits of the first subbyte in the first round output of AES-128 can be written as The bucketing attack is based on the fact that a correct subkey guarantees that B b0 ∩ B b1 = ∅, where Consider only the nibble encoding denoted by δ on the round output without applying linear transformations: Then, one can easily know that B δ b0 ∩ B δ b1 = ∅, where For index = d 0 d 1 , such that d 0 < d 1 (for removing duplicated bucket nibbles), our experimental result depicted in Fig. 11a shows that the correct key always guarantees that B δ b0 and B δ b1 are disjoint.This is in contrast to a result of B b0 and B b1 shown in Fig. 11b that has a number of intersection elements due to linear transformation providing the diffusion effect, where Here, the bucketing attacker can find a key that most frequently makes B b0 ∩ B b1 = ∅, because the wrong hypothetical keys never produced an empty set.Fig. 11c shows that the correct key (0x AA) has 96 indexes (out of 120) that lead to a disjoint set, and the other wrong hypothetical keys never make one.
To evaluate the effect of the masked round output against the bucketing attack, we define ĝ for the lower four bits of the first subbyte in the masked round output as follows: For each plaintext set J di , we collected the target four bits into the set Bbi defined as Bbi Because r(p) generates random numbers, our experiment result shows that Bb0 and Bb1 are never disjoint for any pair of (d 0 , d 1 ), where d 0 < d 1 (Fig. 12).Thus, the bucketing attack does not work on the proposed method.

Performance Analysis
The total table size of the proposed implementation is calculated as follows: - Thus, the total size is 18,395,136 bytes (approximately 17.5 MB).The increased table size compared to the previous masked WB-AES is due to the use of tables that take a two-byte input.This total size is roughly 35.3 times and 3.7 times larger than Chow's WB-AES and the CASE 3 implementation of the previous masked WB-AES, respectively, but there are differences in the range of target attacks and protected rounds.Note that we do not compare with CASE 1 and CASE 2 in the previous version of the masked implementation because these provide incomplete protection.The number of table lookups are counted as follows: - Then, these are 2,512 lookups in total.This is 1.2 times and 0.7 times compared to Chow's WB-AES and the CASE 3 implementation, respectively.As a result, there is little difference in the number of lookups.Because of the relatively large size of the table, available memory space on the target device should be considered.

Conclusion and Future Work
Previously, a white-box cryptographic implementation combined the masking technique to protect against DCA attacks.This implementation eliminated all masks from the round output and applied byte encodings in some outer rounds, which resulted in vulnerabilities to DCA-variant attacks.In this paper, we also adapted masking techniques to the round output in order to depend against existing DCA variants.Based on the previous masked WB-AES, the several round outputs computed by partial bits of the key were masked, and each mask was removed in the input decoding of the next round.Our security evaluation showed that this method can protect against the known vulnerabilities.The downside of this work is the memory requirement that is nearly four times larger than the previous masked WB-AES.Therefore, it would be expensive for low-cost devices with only a few hundred KB of memory, but it could be used for smart devices with enough memory space.The attacks counteracted in this study were carried out using plaintexts and computation traces.Instead of stripping the encoding applied to white-box cryptography, those exploited either the correlation of the intermediate values remaining before and after the encoding or the bijectiveness of the encoding.Thus, a countermeasure to cryptanalysis stripping the encoding away by using internal design information should be adapted in order to offer a more reliable software implementation.Here, memory requirements also increase while defending various types of attacks, consuming a lot of resources in target devices.Therefore, a future work is to design a high-efficiency software countermeasure.For this purpose, it seems necessary to find a new encoding for white-box cryptography.
TypeII MO.No input decoding is performed for the first round because there is no external encoding.TypeII MIMO in the second round.TypeII MIMO in the ninth round.
it is called a balanced d-th order correlation immune function or an d-resilient function.

Fig. 8 :
Fig.8: The Walsh transforms on the TypeII MO outputs (except the mask) in the first round.Black: correct key; gray: wrong key.

Fig. 9 :
Fig.9: The Walsh transforms on the masked round output in the first round.Black: correct key; gray: wrong key.
Between Z * and Zv without round output masking.Between Z * and Z v with round output masking.

Fig. 10 :
Fig. 10: Cosine similarity without and with masking on the round output.Black: correct key, gray: wrong key.
With only the nibble encoding.
With the nibble encoding and the linear transformation.
Number of indexes making a disjoint set for each key.
No disjoint sets for any pair of (d0, d1), where d0 < d1.Black: correct key, gray: wrong key.Number of indexes making a disjoint set for each key.All are 0.
10) ciphertext ← state, where k r is a 4 × 4 matrix of the r-th round key, and kr is the result of applying ShiftRows to k r .As the first step to implement the above algorithm as a series of lookup tables, T-boxes combining SubBytes and AddRoundKeys are defined as T r i,j (p) = S(p ⊕ kr−1 i,j ), for i, j ∈ [0, 3] and r ∈ [1, 9],T 10 i,j (p) = S(p ⊕ k9 i,j ) ⊕ k 10 i,j for i, j ∈ [0, 3],where S and p denote the AES S-box and a subbyte of the plaintext, respectively.From the first to the ninth rounds, column vectors in the MixColumns matrix MC are multiplied with lookup values from T-boxes.Let [x 0 , x 1 , x 2 , x 3 ] T be a column vector of the intermediate state after mapping the round input to Tboxes.Then, we have Chow et al. applied the encoding consisting of 32×32 linear transformations (mixing bijection) and two 4-bit concatenated nonlinear transformations (nibble encoding) to the T y i (•) values in order to obfuscate key-dependent intermediate values.This encoded lookup table was commonly named TypeII (Fig. 1a).When generating an XOR table to combine the output of the decomposed MixColumns, no inverse linear transformation is involved because of the distributive property of matrix multiplication over logical bitwise XOR.The nibble encoding, on the other hand, enables the XOR table to take two 4-bit inputs, preventing the overall size of the table from becoming large.This XOR table was aptly named TypeIV II.Next, the TypeIII table (Fig.
describes the table lookup overview.