Low Randomness Masking and Shuﬄing: An Evaluation Using Mutual Information ∗

. Side-channel countermeasure designers often face severe performance overheads when trying to protect a device. Widely applied countermeasures such as masking and shuﬄing entail generating a large amount of random numbers, which can result in a computational bottleneck. To mitigate the randomness cost, this work evaluates low-randomness versions of both masking and shuﬄing, namely Recycled Randomness Masking (RRM) and Reduced Randomness Shuﬄing (RRS). These countermeasures employ memory units to store generated random numbers and reuse them in subsequent computations,making them primarily suitable for implementation on devices with suﬃcient memory. Both RRM and RRS are evaluated using the MI-based framework in the context of horizontal attacks. The evaluation exhibits the tradeoﬀ between the randomness cost and the noisy leakage security level oﬀered by the countermeasures, enabling the designer to ﬁne-tune a masking or shuﬄing scheme and maximize the security level achieved for a certain cost.


Introduction
The continuously growing Internet of Things (IoT) is rapidly changing modern infrastructure.Several industrial sectors, including construction, IT, agriculture, energy and automotive manufacturing, are already harnessing the transformative impact of IoT on their products.The price drop of IoT devices has enhanced everyday objects with data processing capabilities and network connectivity.Still, the two most important challenges to IoT adoption are the high cost of investment, as well as current concerns about security and privacy [2].When combined together, these challenges exacerbate the need for devices that offer an adequate level of protection at a reasonable cost, thus motivating the current line of research.
For instance, the option of power/electromagnetic side-channel attacks (SCA) allows adversaries to recover sensitive data, by observing and analyzing the physical characteristics and emanations of a cryptographic implementation [29].Naturally, these attacks have lead towards countermeasures that perform noise amplification to impede potential adversaries.Two of the most widely deployed countermeasures in cryptographic implementations are masking and shuffling and they are often combined in order to enhance the security level [14,34,38].In order to hinder the attacker, masking applies secret-sharing techniques that randomize intermediate values, while shuffling randomizes the order of the cryptographic blocks and/or the implementation's instructions.As a result, both countermeasures require random numbers to function, making on-chip random number generation (RNG) a useful addition to the device.stated as L id (.).Observable leakages of a certain intermediate value V (or its instance v) are denoted using subscript L V (or L v respectively).Leakages observed after a specific cipher layer are denoted using superscript L <layer> .The average of leakage variables from set S is denoted as L, i.e.L = (1/|S|) * v∈S L v .
Boolean Masking.Chari et al., Goubin et al. and Messerges [14,21,30] were the first to suggest randomizing intermediate values with a secret sharing scheme, forcing the adversary to analyze higher-order statistical moments.In detail, a dth-order secure Boolean masking scheme splits a sensitive value x into d + 1 shares (x 0 , x 1 , . . ., x d ), as shown below. x The shares (x 0 , x 1 , . . ., x d ) are also referred to as the (d + 1)-family of shares corresponding to x [33].Assuming sufficient noise, it has been shown that the number of traces required for a successful attack grows exponentially w.r.t. the security order d [14,32], i.e. masking performs noise amplification.Several definitions have been used to specify the formal security properties of a masking scheme, and we revisit the most relevant below.
Probing-secure scheme.We refer to a scheme that uses certain families of shares as t−probing-secure iff any set of at most t intermediate variables is independent from the sensitive values [27].
Non-interfering scheme.We refer to a scheme as t−non-interfering (t−NI) iff any set of at most t intermediate variables can be perfectly simulated with at most t shares of each input [6].
Strongly non-interfering scheme.We refer to a scheme as strong non-interfering (t−SNI) iff any set of at most t intermediate variables, where t 1 are on the internal variables and t 2 on the output variables, can be perfectly simulated with at most t 1 shares of each input [6].
Multiplying two families of shares under an ISW Boolean masking scheme consists of the computation of all partial products, as well as a compression algorithm that produces the final result, while injecting randomness [8,27].Several implementation techniques and evaluation strategies have been suggested in the context of masking.With respect to implementation aspects, the techniques proposed range from lookup-table techniques [15,39] to GF -based circuits [13,22,33].Regarding the evaluation strategies, recent advances by Battistello et al. [7] and Grosso et al. [25], suggest that masked multiplications are prone to horizontal attacks, i.e. attacks that exploit several noisy intermediate values that are computed during the scheme.In this work, we put specific emphasis on the impact of horizontal exploitation to the noisy leakage security level of the scheme.
In the application of Boolean masking schemes, secure multiplications require quadratic data complexity w.r.t.randomness, in order to ensure the refreshing of partial products.Initially, Rivain et al. [33] extended the ISW scheme [27] and put forward a d-private compression algorithm (RP) that can compute dth-order secure multiplications in GF (2 n ) using d(d + 1)/2, n-bit elements.Following, Belaïd, Benhamouda, Passelègue et al. [8] suggested an improved d-private compression (BBP) that performs partial product refreshing using d2 /4 + d random numbers for security orders d > 4. In addition, they derived optimal compression algorithms for security orders d = 2, 3 and 4 which have data complexity, respectively 2, 4 and 5 random elements per multiplication.
Despite recent efforts, it is notable that high-order masking implies a severe RNG overhead.Making for instance a 2nd-order secure AES implementation with optimal compression (2nd-order secure BBP) requires 10240 random bits per block encryption1 .Generating this amount of random bits with a pseudo AES-based random number generator in ATmega microcontrollers implies an optimistic cost of roughly 20k clock cycles (2-round AES generator) and a pessimistic cost of 100k clock cycles (10-round AES generator) [1,23].
We observe that the pessimistic case is fairly close to the computational cost of the 2ndorder secure AES on AVR devices [3], i.e. it amounts to approximately 38% of the clock cycles.Similarly, a 2nd-order secure PRESENT implementation on an ARM Cortex-M device spends 25% of its execution time for TRNG [18].The severe overhead of RNG in masking countermeasures can render the implementation cost prohibitive for small embedded devices and has led countermeasure designers towards lightweight alternatives.Low-entropy masking schemes [9,24] reduce the randomness requirements by using masks chosen within a subset of all the possible masks, yet if the leakage function is not linear, they may reduce the security order.Schemes that amortize randomness [20] can achieve similar goals without this shortcoming.In similar lines of work, threshold implementations examined techniques that reduce or even minimize the fresh randomness required to achieve uniformity [10,17].Still, we stress that several of these schemes need to be evaluated in a fair manner, i.e. by using horizontal leakage exploitation, such as the analysis carried out by Battistello et al. [7] and Grosso et al. [25].
Shuffling.The shuffling countermeasure results in spreading information over n different points in time, according to a random permutation P n [34].The permutation P n is defined as a vector (P 1 , . . ., P n ), where P i represents the new position of element i and thus P n is defined over the set of all possible n-dimensional permutations P n .For instance, assume two independent variables X = (X 1 , X 2 ) that leak L = (L X1 , L X2 ) at different points in time.The shuffling scheme will generate a 2-dimensional permutation P 2 s.t.L X1 =L id (X P1 ) + noise and L X2 =L id (X P2 ) + noise.Charvillon et al. [38] have analyzed the security provided by shuffling, in addition to investigating several implementation techniques.Motivated by the increased cost of RNG, Veshchikov et al. [36] investigated cheaper shuffling methods.We will refer to a permutation that shuffles n independent operations of a specific cipher layer as P {o1,...,on} n , where o i the ith operation in the layer.Similar to masking, applying the shuffling countermeasure implies a non-negligible randomness cost.Specifically, generating a permutation for shuffling k independent operations of the same type requires k * log 2 (k) random bits, using a slightly-biased version of the Knuth shuffle algorithm [28,38].In a practical scenario, shuffling only 16 AES Sboxes requires 640 random bits in total 2 .In order to deal with this RNG overhead, previous work on the shuffling countermeasure opted to reduce the amount of possible permutations (random start index), to shuffle only in selected rounds (partial shuffling) or to use non-homogeneous shuffle patterns, where the amount of possible permutations varied between cipher layers [26,34].

Recycled Randomness Masking -RRM
This section puts forward a low-randomness version of the standard Boolean masking schemes, namely it introduces Recycled Randomness Masking (RRM).The novelty of RRM lies in considering two or more masked multiplications simultaneously and sharing randomness between their compression layers.Using this approach, we develop t−NI gadgets that reduce the RNG overhead of masked ciphers and enable side-channel protection at a modest budget.We commence with two elementary examples that will be used throughout this section to illustrate the core recycling idea and we introduce additional notation to describe generic RRM schemes (Section 3.1).We continue with section 3.2 which searches for optimized t−NI randomness-recycling gadgets using formal verification techniques and applies them to the AES cipher.Finally, Section 3.3 analyzes the noise amplification stage of RRM schemes and demonstrates the impact of recycling in the noisy leakage model.

Recycling Randomness in Masking
We illustrate the application of RRM using two masked ISW multiplications z = xy and c = ab.The multiplications are protected by ISW of security order d = 1 (Figure 1) or ISW of security order d = 2 (Figure 2).Both examples assume 4 independent families of shares (x i ) 0≤i≤d , (y i ) 0≤i≤d , (a i ) 0≤i≤d and (b i ) 0≤i≤d in GF (2).Values t 0 , t 1 , t 2 , w 0 , w 1 , w 2 are random elements in GF (2) that are necessary to maintain probing security 3 .
Figure 1: RRM scheme applied on two 1st-order secure ISW multiplications, generating random element w 0 in multiplication 1 and recycling it in multiplication 2.
Figure 2: RRM scheme applied on two 2nd-order secure ISW multiplications, generating random elements w 0 and w 1 in multiplication 1 and recycling them in multiplication 2.
In Figures 1 and 2, red-annotated variables are fresh random elements and greenannotated variables are recycled random elements.The left-arrow assignment describes the recycling of a random element in a different multiplication.For instance, in the 1st-order secure ISW-based scheme of Figure 1, the element w 0 is generated in multiplication 1 and it is subsequently recycled in multiplication 2 (t 0 ← w 0 ).Likewise, the 2nd-order secure example of Figure 2 showcases two ISW multiplications, which originally require 6 random elements: (w 0 , w 1 , w 2 ) and (t 0 , t 1 , t 2 ).To tackle the RNG overhead, RRM generates 3 fresh random elements (w 0 , w 1 , w 2 ) during multiplication 1 and recycles w 0 in t 0 and w 1 in t 1 .Thus, the amount of random elements required in multiplication 2 is reduced from three to a single random element (t 2 ).
The proposed recycling technique can be generalized to more than two multiplications of any order and to describe such generic RRM schemes we introduce the following notation.We assume a gadget consisting of n dth-order secure masked multiplications, where every masked multiplication requires s random elements to maintain probing security, e.g.multiplication i requires random elements (r i,1 , r i,2 , . . ., r i,s ).We assume all the inputs to the masked multiplications to be independent families of shares, which may require fresh randomness in our implementation.In addition, we assume that at least one out of n masked multiplications will generate fresh randomness.Subsequently we define a Every set R i describes all the the fresh or recycled random elements r i,j , 1 ≤ i ≤ n, 1 ≤ j ≤ s that the multiplication i is using to maintain probing security.Figure 1 for instance has R = {{r 1,1 }, {r 2,1 }} = {{w 0 }, {w 0 }}, since the single random element w 0 is generated and used in multiplication 1 and it is reused (recycled) in multiplication 2. Similarly, in Figure 2, R = {{r 1,1 , r 1,2 , r 1,3 }, {r 2,1 , r 2,2 , r 2,3 }} = {{w 0 , w 1 , w 2 }, {w 0 , w 1 , t 2 }}, since random elements w 0 and w 1 are generated in multiplication 1 and they are recycled in multiplication 2, while element t 2 is generated in multiplication 2. If only a single multiplication generates fresh random elements and all the other multiplications recycle them, then it holds that Symmetrically, if no randomness gets recycled (a.k.a.standard masking), then it holds that R i ∩ R j = ∅ for all 1 ≤ i, j ≤ n and i = j.To specify the RNG overhead when recycling, we define the randomness cost of an RRM gadget with n multiplications as the total amount of fresh random elements generated.E.g. in Figure 1 the randomness cost is 1 and in Figure 2 the cost is 4, while in general the cost of an RRM scheme with recycle set R is The cost of standard masking of n multiplications (without recycling randomness) is equal to n * s.In addition, we define the masking recycle factor f rm of every random element in the RRM scheme as the number of times it has been used in any multiplication.In the example of Figure 1, f rm (w 0 ) = 4, since it occurs twice in every multiplication.Similarly, in the example of Figure 2, It is noteworthy that the recycling of random numbers is similar to the repeated access to shares observed by Battistello et al. [7], where the recycle factor of a share in dth-order secure scheme is shown to be equal to d + 1.We will henceforth refer to a dth-order secure masking gadget with n multiplications and recycle set R as RRM(d, n, R).It is important to stress that RRM necessitates storing and fetching the recycled random elements.Let the reduction in randomness cost achieved by an RRM(d, n, R) scheme.RRM requires g less random elements and at most g extra storage units, depending on how many times the elements are recycled.In addition, RRM requires at most g extra store and fetch instructions when recycling.For example, in Figure 2 the gain g = 2 * 3 − |{w 0 , w 1 , w 2 } ∪ {w 0 , w 1 , t 2 }| = 2 and it implies 2 extra storage units (w 0 and w 1 ), 2 extra store instructions and 2 extra fetch instructions.

Efficient RRM Multiplication Gadgets
As demonstrated, the core contribution of RRM is to reduce the randomness cost of n multiplications below the n * s random elements which are required by standard masking.Notably, both ISW and BBP schemes are already reusing random elements during the compression layer of a single multiplication, while maintaining dth-order probing security 4 .Still, excessive recycling between multiplications can lead to RRM gadgets that are no longer probing-secure.For instance, assume the ISW-based RRM(2, 2, R) gadget of Section 3.1, Figure 2 with recycle set R = {{w 0 , w 1 , w 2 }, {w 0 , w 1 , w 2 }}, i.e. 3 fresh elements are generated in multiplication 1 and they are all recycled in multiplication 2.Then, the tuple (z 2 , c 2 ) depends on the sensitive values x, y, a and b simultaneously, because Since there exists such a tuple (z 2 , c 2 ), the particular RRM gadget is not 2nd-order probing-secure.
As a result, this section proposes t−NI optimized multiplication gadgets that are capable to recycle a large amount of randomness.Analytically, for an RRM(d, n, R) gadget, we search for recycle sets R that minimize the randomness cost, while the gadget remains t−NI.We focus on small orders (d = 1, 2, 3) and two multiplications per gadget (n = 2) due to their practical relevance in implementations.To detect potential security flaws, we use the Lisp-based formal verification tool suggested by Coron [16].The tool generates all possible tuples of intermediate values (with dimension less or equal to d) that stem from the RRM(d, n, R) gadget and verifies the t−NI property using circuit transformations.This process is repeated for all recycle sets R that ensure the correctness of the scheme, rejecting the insecure choices and identifying the optimized recycle set that minimizes the randomness cost.
The performed brute-force search of Algorithm 3a is carried out for both ISW-based and BBP-based schemes 5 and Figures 3b until 3f demonstrate the optimized t−NI gadgets.The randomness and storage requirements of the proposed RRM gadgets are demonstrated in Tables 1 and 2, which confirm that RRM is capable of reducing the randomness cost substantially when compared to standard masking.It remains an open research question to quantify how much the lack of composability (strong non-interference) affects the efficiency, i.e. how many additional refresh layers will be required in the scheme in order to provide a fair comparison with the work of Faust et al. [20].3b).Thus, the Sbox-related RNG cost of two AES executions is reduced from 10240 to 5120, i.e.RRM achieves a 50% reduction of the RNG overhead, at the penalty of 5120 element storage, 5120 store and 5120 fetch instructions 6 .In a similar fashion, we can apply the 3rd-order secure BBP-based RRM scheme.(a) Brute-force search algorithm.

RRM Noise Amplification
The previous section (Section 3.2) pinpointed the first pitfall of RRM schemes, i.e. how excessive recycling can result in gadgets that are not probing-secure.Having tackled this issue for low-order gadgets with formal methods, we proceed towards the second pitfall of RRM.Namely, excessive recycling is hazardous to the noise amplification stage of masking, even when the gadget is probing-secure.Specifically, this section analyzes the noise amplification stage of several t−NI RRM gadgets of Section 3.2, using the mutual information metric suggested by Standaert et al. [35].In other words, we evaluate the proposed "recycling" countermeasure in the noisy leakage model and compare it to standard masking schemes.The effectiveness of the noise amplification stage of RRM largely depends on the adversary's capability to observe multiple noisy intermediate values during the gadget's execution.We refer to this capability as horizontal exploitation and we consider the following cases (C1-C3), in ascending order of adversarial strength: C1 Naive-tuple attack.The adversary exploits a single noisy (d+1)-tuple of the RRM gadget and disregards any repetition of noisy intermediate values.This scenario is equivalent to an attack against a non-recycling scheme that disregards intramultiplication repetitions.
C2 Chosen-tuple attack.First, the adversary observes the noisy leakage of share repetitions (noted also by Battistello et al. [7]) and the noisy leakage of random element repetitions (noted in this work as "randomness recycling") in the gadget.Second, he averages the observed noisy leakages in order to denoise the side-channel emission.Finally, he exploits a chosen leakage (d + 1)-tuple of the RRM gadget that takes advantage of the denoising.
C3 Full-state attack.First, the adversary observes the noisy leakage of share repetitions and random element repetitions in the gadget.Second, he averages the observed leakages in order to denoise the side-channel emission.Finally, he exploits the full state, i.e. all leaky intermediate values of the RRM gadget.
For our information-theoretic analysis (cases C1-C3), we introduce the following notation to describe the leaky intermediate values and the noisy leakage of RRM gadgets.In a given (d + 1)-tuple of intermediate values, let random variable S be the sensitive (key-dependent) intermediate value under attack and let random variables M 0 , . . ., M d−1 be the masks used to protect the sensitive value.The leakage of a (d + 1)-tuple is described using the following random vector: and N is a (d + 1)-dimensional random vector representing Gaussian noise.We assume independent and equal noise σ 2 in every sample, i.e. diagonal noise covariance matrix and In the naive-tuple case C1, the adversary disregards the multiple accesses to the family shares (due to the structure of the masking scheme) and also disregards the random element repetition (due to recycling), thus he cannot observe any repeated leakages.In other words, the noise amplification stage in the C1 case is equivalent to that of standard Boolean masking.This naive case is only applicable if the evaluator cannot identify and locate the sample positions of the repeated leakages.
Contrary to C1, the chosen-tuple case C2 assumes that the adversary can locate the leakage sample position of repeated shares and recycled random elements, yet he is still limited to exploit a single (d + 1)-tuple of leaky intermediate values for his attack.The number of repetitions of a specific random element or share v in the RRM gadget is equal to its recycle factor f rm (v).Averaging all available samples that leak value v results in substantial noise reduction, i.e.L v ∼ N (µ v , σ 2 /f rm (v)), which the adversary can use in order to diminish the noise amplification effect of masking.Specifically, he can target a carefully chosen (d + 1)-tuple of leaky intermediate values, whose leakages have been noise-reduced beforehand.For instance, going back to the example of Section 3.1 -Figure 2, a sufficient (yet not efficient) attack tuple for RRM(2,2,{{w 0 , w 1 , w 2 }, {w 0 , w 1 , t 2 }}) is (x 0 y 0 , x 1 y 1 , x 2 y 2 ).Since all the intermediate values of the tuple appear only once, it holds that L xiyi ∼ N (µ xiyi , σ 2 ) for 0 ≤ i ≤ 2 and the noise amplification is the same as standard masking.A more efficient choice is tuple (z 2 , w 1 , w 2 ), where L z2 ∼ N (µ z2 , σ 2 ), yet L w1 ∼ N (µ w1 , σ 2 /4) and L w2 ∼ N (µ w2 , σ 2 /2), because f rm (w 1 ) = 4 and f rm (w 2 ) = 2.
To highlight the effects of recycling on the noisy leakage model, we performed an MI-based evaluation for 1st and 2nd-order secure ISW RRM gadgets that are proposed in Section 3.2.We make various choices w.r.t. the recycling factor (f rm ranges from 1 to 10) and the strength of horizontal exploitation (we consider both naive-tuple C1 and chosentuple C2 adversaries).The experiments are described in Table 3. Naturally, the evaluation depends on the aforementioned parameters, yet we stress that it is adaptable to all RRM choices made by the countermeasure designer.Concretely, computing the MI-metric for a (d + 1)-tuple requires summing over the randomness vector M = (M 0 , . . ., M d−1 ) and computing (d + 1)-dimensional integrations [25].The resulting MI vs. noise variance plot is visible in Figure 4 (left).In addition to the MI-metric, we use the conjecture of Duc et al. [19], in order to approximate the number of traces required to perform a key recovery in the high-noise regime.Analytically, we use the bound #traces ≥ The evaluation results of 1st and 2nd-order secure RRM (cases C1 and C2) are visible in Figure 4 (left), from which we derive three core observations.First, we note that the intermediate values used by the attacker affect directly the RRM evaluation, i.e. the attacker can reduce the security level only by including the average leakage of the repeated random elements or shares in his attack.If the noise-reduced leakages are disregarded (Figure 4 solid red and solid blue lines), then the noise amplification remains intact and equivalent to standard masking of the same order.Second, assuming the right tuple is chosen, we observe that increasing the total recycle factor shifts the MI-curve to the right, i.e. the amount of recycling (modest or excessive) indeed damages the noise amplification stage of the scheme.This shift is visible between the dashed blue line (modest recycling) and the dotted blue line (excessive recycling).Note also that excessive recycling may increase the MI of a 2nd-order secure gadget above the MI of a 1st-order secure scheme.Third, we conclude that the RRM technique is in fact a tradeoff between the mutual information level achieved and the randomness cost required.This fact solidifies it as a lightweight alternative to standard masking that can be used by countermeasure designers when the randomness cost becomes prohibitive in a certain application context.Naturally, the designer needs to always be aware of the device's noise level in order to adapt RRM order and recycle set accordingly.Figure 4: MI evaluation and no.traces bound for 1st and 2nd-order secure RRM schemes with 2 multiplications, assuming naive-tuple (C1 -equivalent to std.masking) and chosentuple (C2) adversaries.The evaluation considers gadgets with modest and excessive recycling.Blue lines denote 2nd-order attack (vs.1st-order RRM) and red lines denote 3rd-order attack (vs.2nd-order RRM).
Table 3: t−NI RRM gadgets analyzed in the noisy leakage model assuming naive-tuple (C1) and chosen-tuple (C2) adversaries.The attacks exploit a large amount of the available recycling (case C2, excessive recycling) or a small amount of recycling (case C2, modest recycling) or they disregard recycling (case C1).
To incorporate the noise reduction in our evaluation, we consider the worst-case scenario were the adversary is able to reduce the noise of all intermediate values by a recycle factor f max rm .The factor f max rm is the maximum recycle factor observed in any random number or share, e.g. in the gadget RRM(2, 2, {{r 1 , r 2 , r 3 }, {r 1 , r 2 , r 4 }}), the maximum recycling is observed on random numbers r 1 and r 2 , thus f max rm = 4.Note that f max rm may stem from either repetitions of random numbers or repetitions of shares.The bound constructed is conservative, since we assume an adversary that can average every noisy intermediate value of the encoding by the maximum recycle factor, i.e.L Xi ∼ N (µ Xi , σ 2 /f max rm ), 0 ≤ i ≤ d.Still, it provides an efficient alternative to direct computation of the MI formula and demonstrates the evaluation trend for RRM schemes in the high-noise regime.It remains open whether closer bounds can be derived for such scenarios.In Figure 5 (left) we demonstrate the MI evaluation of 2nd and 3rd-order secure RRM schemes, with known recycle factor f max rm shown in Table 2, using the conservative bound which raises M I(X i ; L Xi ) to the security order.In Figure 5 (right) we demonstrate the no. of traces bound.
Figure 5: MI evaluation for 2nd and 3rd-order secure RRM schemes comparing naive-tuple (C1) with full-state attack (C3).Blue lines denote 3rd-order attack (vs.2nd-order RRM) and red lines denote 4th-order attack (vs.3rd-order RMM).On practical attacks and realistic leakage models.The non-trivial data complexity of the optimal attacks for large order d and large amount of integral dimensions, has led to the development of heuristic attacks that combine the horizontal information of several leaking instructions in a sub-optimal, yet efficient manner.To demonstrate this, we provide two types of heuristic horizontal attacks on simulated leakages of a 1st-order secure gadget with 16 multiplications, namely RRM(1, 16, {{r 1 }, . . ., {r 1 }}).First, we employ the chosen-tuple attack (C2), where the attacker chooses a (d + 1)-tuple of leakages whose values have been sufficiently denoised by averaging the respective repetitions.The horizontal exploitation of this heuristic attack implies a small overhead for the adversary, namely he needs to perform an averaging pre-processing step.Consecutively, the adversary will employ the noise-reduced tuple in order to attack using Correlation Power Analysis (CPA) [12].The second heuristic attack that we use in order to exploit horizontally the simulated traceset is a Soft Analytical Side-Channel Attack (SASCA) [37], applied in the context of masking [25].The SASCA performs the same averaging during the preprocessing step of the first heuristic attack.Continuing, it exploits the full state of a multiplication by constructing a factor graph and using a belief propagation algorithm.The horizontal exploitation of SASCA implies an overhead depending on the factor graph.The results of the two heuristic attacks and the results of the naive-tuple CPA attack without noise averaging (C1) are visible in Figure 6a.As expected, the additional effort w.r.t.horizontal exploitation of the SASCA attack improves the success rate compared to C1 and C2.Moreover, throughout this work we assumed idealized leakage noisy leakage model, namely independent shares that leak according to the identity function.In practice, several devices showcase order reduction due to various device effects such as glitches, distance-based leakages and coupling [31].We demonstrate this effect on Figure 6b, using a 3rd-order secure RRM scheme and the order-reduction theorem of Balasch et al. [4], which states that distance-based leakages can reduce the security order from d to d−1 2 .The red and blue lines of Figure 6b give the lower and upper security bounds caused by a large class of real-world leakage flaws.
(a) Success rate of naive, chosen-tuple and SASCA attacks on simulated traces of 1st-order secure RRM with frm = 16.
(b) Security of 3rd-order secure RRM scheme under ideal (blue line) and order-reduced leakage (red line).The order-reduced line is equivalent to 1st-order secure RRM.

Figure 6: Practical attacks and realistic leakage models
On the necessity of a noise-based analysis.We conclude this section by showcasing the importance of a noise-oriented analysis of RRM using the following custom scenario.Assume an RRM gadget that recycles a single random number between n 1st-order secure ISW multiplications with mutually independent inputs, where n is large.Trivially, an ISW-based proof shows such a case to be secure, because the adversary can probe only a single intermediate value and thus cannot view multiple recyclings.Thus, we can recycle a random number infinitely while the scheme remains probing-secure.In practice however, the noise level of the leaking random number can be eliminated by averaging the recyclings.A noise-based analysis such as the MI metric or the SASCA can exploit recycling horizontally and is essential to quantify the security damage.If instead the attack remains naive, it may lure the evaluator into a false sense of security.

Reduced Randomness Shuffling -RRS
Motivated by the recycling ideas of Section 3, we use a similar approach on the popular shuffling countermeasure against side-channel analysis.Analytically, we put forward the Reduced Randomness Shuffling (RRS) countermeasure which consists of three shuffling variants that can alleviate the randomness cost involved.In Section 4.1 we analyze how RRS reduces the randomness cost compared to standard shuffling.Section 4.2 analyzes the susceptibility of RRS to horizontal/multivariate attacks in the noisy leakage model.

Reducing Randomness in Shuffling
To achieve the goal of RNG reduction, we explore the following three variants: partitioned, merged and recycled shuffling.We demonstrate these three variants using a generic structure of layers and independent operations.In particular, we assume that the cipher we want to shuffle can be described by the layer set L = {L 1 , L 2 , . . ., L n } that consists of sets L i , 1 ≤ i ≤ n.Every set L i describes s independent operations that constitute this layer, e.g.L i = {o i,1 , o i,2 , . . ., o i,s }.The partitioning of a cipher into layers and of layers into independent operations rests upon the countermeasure designer and it is closely related to the cipher implementation.For instance, the independent operations may range from whole cipher parts (e.g.shuffling Sboxes) to individual assembly operations (e.g.shuffling key-dependent instructions).We will refer to a Reduced Randomness Shuffling scheme that shuffles independent operations according to the layer set L as RRS(L).In addition we specify the randomness cost of the RRS countermeasure as the total RNG overhead required to shuffle the cipher according to layer set L. Figures 7a-7d illustrate the application of RRS on a layered structure.
The example of Figure 7a commences with an RRS scheme that shuffles a cipher structure, using n = 2 layers and s = 4 independent operations per layer, i.To scale down the randomness cost, partitioned shuffling splits vertically a set of independent operations L i into two or more smaller subsets that are cheaper to shuffle.For instance, in Figure 7b, instead of shuffling a single set of 4 independent operations L 1 = {o 1,1 , o 1,2 , o 1,3 , o 1,4 }, we opt to partition L 1 in two subsets of 2 independent operations each.Thus, we will partition L 1 to L 1 = {o 1,1 , o 1,2 } and L 1 = {o 1,3 , o 1,4 }.An analogous partitioning is done in L 2 , resulting in L 2 = {o 2,1 , o 2,2 } and L 2 = {o 2,3 , o 2,4 }.We define the granularity of this vertical partitioning as the partition factor f p , where f p = 1 implies no partitioning.Performing partitioned shuffling with factor f p on |L i | independent operations reduces the randomness cost of layer i to |L i | * log 2 (|L i |/f p ) .In the example of Figure 7b, we use f p = 2 on both cipher layers and we replace P L1  4 and P L2 4 with P L 1 2 and P L 2 2 respectively, reducing the cost of a single execution from 16 to 8 bits.To similar ends, the merged shuffling variant combines several cipher layers horizontally in order to permute them together.The example of Figure 7c views L 1 and L 2 as a single layer and shuffles them using the same permutation.That is, we merge P L1 Last, recycled shuffling opts for the "external" recycling of the generated permutation, i.e. we reuse a permutation between different executions or rounds of the cipher structure.In Figure 7d, the layer L 1 of cipher execution no. 1 and the layer L 1 of cipher no. 2 are independent, yet they are shuffled with the same permutation P L1 4 .We define the recycle factor of shuffling f rs as the number of repetitions of a permutation in different cipher iterations, i.e. f rs = 1 implies no recycling.Recycled shuffling can reduce the randomness cost of a certain layer i from (#executions) We note that only recycled shuffling implies an overhead due to storage units and store/fetch instructions, while partitioned and merged shuffling simply use less randomness.The overhead relates to the recycle factor of shuffling, i.e. reusing the same permutation results in f rs extra store/fetch instructions and a memory unit to store the random number.On the application of RRS to the AES cipher.Assume that the countermeasure designer focuses on the first two layers of the AES cipher, namely KeyAddition (ka) and Sbox (sb).The standard way to shuffle them would require two permutations on 16 independent operations, i.e.P KA 16 and P S 16 , costing 128 random bits per round, resulting in 1280 bits for 10 rounds of AES.Alternatively, the designer can opt to partition both layers with partition factor f p = 4, i.e. split {ka 1 , . . ., ka 16 } and {sb 1 , . . ., sb 16 } into {ka 1 , . . ., ka 4 }, {ka 5 , . . ., ka 8 }, {ka 9 , . . ., ka 12 }, {ka 13 , . . ., ka 16 } and {sb 1 , . . ., sb 4 }, {sb 5 , . . ., sb 8 }, {sb 9 , . . ., sb 12 }, {sb 13 , . . ., sb 16 } respectively.Thus, the cost is reduced to 640 bits (P KA 4 and P S 4 for 50% RNG reduction).In a similar fashion, the designer can merge the KeyAddition and Sbox layers into a single layer, i.e.L = {kasb 1 , . . ., kasb 16 }, reducing again the cost to 640 bits (P KA,S 16 for 50% RNG reduction).Finally, any generated permutations on KeyAddition, Sbox can be recycled in subsequent AES executions, reducing RNG even further, at the penalty of extra storage.

RRS Noise Amplification
As expected, reducing the randomness cost of shuffling has a direct impact on the noise amplification effect of the countermeasure, offering an interesting randomness-security tradeoff for the designer.Similarly to Section 3.3, we evaluate the variants of RRS via the mutual information framework and consider an adversary that can exploit horizontally more than a single cipher layer.We perform our evaluation on the layered cipher structure used previously, where the adversary attempts to recover any part of the key k = (k 0 , k 1 , k 2 , k 3 ) that is related to the 4 independent operations of L 1 and L 2 .To that end, he may exploit the leakage from both layers as well as direct leakage from the permutations used to shuffle these layers.Below, we introduce the random variable notation that describes shuffling in the noisy leakage model.
• The adversary can observe the leakage vector after every cipher layer, namely L L1 and L L2 .The leakage variables L i depend on the layer permutations P L1 n and P L2 n , thus it holds that ) + noise, where noise represents additive Gaussian noise N (0, σ 2 ).
• The adversary can observe the direct permutation leakage of every shuffled layer, namely L L1 and L L2 .For layer permutations P L1 n and P L2 n , it holds that L L1 i = L id (P L1 i ) + noise and L L2 i = L id (P L2 i ) + noise, where noise represents additive Gaussian noise N (0, σ 2 ).
To analyze the tradeoff between the MI level and the randomness cost, we perform the MI-based evaluation for several versions of the RRS and attack options.The cases are demonstrated in Table 5.The evaluation uses the formula by Charvillon et al. [38], which we update in order to account for the partitioned, merged and recycled shuffling with factors f p , f m and f rs respectively.In the formula above, we assume that the adversary attacks a certain key part K t , where t ∈ {0, 1, 2, 3}.We also note that the adversary in general exploits η-dimensional leakage vectors L, L and performs summations over the set of θ-dimensional permutations.In the following analysis we show how parameters η and θ relate to the particular RRS variant to the right of curve D2, showing that merged shuffling can improve the effectiveness of multi-layer horizontal attacks and it is detrimental to the MI level.Last, we compare partitioned, merged and recycled shuffling (case D4) with equivalent shuffling that can observe the repeated direct permutation leakage.Specifically, in both cases, the exploits horizontally two partitioned layers that use the same permutation, i.e. η = 4 and θ = 2.Note however, that in case D5 the adversary can also observe the repeated direct permutation leakage, i.e. he has access to L L1 j for all executions j = 1, . . ., f rs , while D4 assumed equiprobable permutations.As a result, in case D5, the adversary can reduce the noise level of the direct permutation leakage by computing L L1 ∼ N (µ L1 , (1/f rs ) * Σ), where Σ a diagonal covariance matrix.Figure 8b shows how exploiting the direct permutation leakage enhances the attack.

Conclusions & Future Directions
In this work, we have performed an in-depth investigation of low-randomness alternatives to standard masking and shuffling, namely RRM and RRS.The first core outcome is that RRM and RRS can offer effective tradeoffs between randomness cost and security.A designer of side-channel countermeasures can now rely on the MI-based evaluation and provide optimized and flexible protection that reduces the randomness cost.
The second core outcome of this work is demonstrating the importance of horizontal exploitation in masking and shuffling.We have shown that univariate (or partially horizontal) evaluations provide us with only a part of the whole picture and may lure the evaluator into a false sense of security.By examining the multivariate adversarial model, we exploit a larger quantity of the available leakage and provide a more complete security evaluation.
Last, this work has demonstrated the necessity of noise-based analysis as a complement to formal methods.We maintain that a sound evaluation approach is to start from a provably secure scheme and enhance it with a noise-based analysis in order to provide a more holistic view.
With regards to future work, we note that multivariate evaluation techniques are still at a nascent stage when it comes to real-world devices.In fact, research efforts concentrate on a fairly high abstraction layer, i.e. they only consider leaky cipher operations, disregarding many peculiarities of the hardware and physical layers.Future research needs to strive towards closing the gap between theoretical and applied evaluations and improve the attacks that are able to exploit horizontally RRM and RRS.
Moreover, a long-term vision is research towards unifying several side-channel and fault injection countermeasures under the MI framework.Based on this unification, the countermeasure designer will possess a plethora countermeasure options at his disposal, which he can combine and tweak in order to maximize side-channel and fault injection security w.r.t. a given budget in clock cycles, silicon area or power/energy consumption.

4 and P L2 4 into permutation P L 4 ,
s.t.L = {{o 1,1 , o 2,1 }, {o 1,2 , o 2,2 }, {o 1,3 , o 2,3 }, {o 1,4 , o 2,4 }}.We define the granularity of this horizontal combination as the merge factor f m , where f m = 1 implies no merging and observe that merged shuffling can reduce the randomness cost of a single iteration to (|L|/f m ) * k * log 2 (|L i |) .Naturally, merging and partitioning can be combined, resulting in randomness cost of (|L|/f m ) * |L i | * log 2 (|L i |/f p ) bits per iteration.Still, different cipher layers can present a different number of independent operations for partitioned/merged shuffling and thus may need to be homogenized by shuffling additional dummy operations.

Figure 7 :
Figure 7: Initial, partitioned, merged and recycled shuffle is applied to the layered cipher structure in Figures (a) -(d).Dashed-line boxes indicate the operations and layers that are shuffled with the same permutation.The arrows indicate the information flow between layers.

Table 1 :
Randomness cost of optimized RRM schemes for n = 2 multiplications.

Table 2 :
Storage cost of optimized RRM schemes for n = 2 multiplications, assuming no storage is needed when recycling within a single multiplication (large register file).

On the application of RRM gadgets to the AES Sbox. Applying
During the first execution of the AES cipher, we generate all the necessary random elements without any recycling, i.e. for the first full Sbox execution we need 16 * 32 * 1 = 512 random elements, resulting in 10 * 512 = 5120 random elements for 10 rounds of Sbox executions.During the second independent execution of the AES cipher every Sbox multiplication can recycle the randomness generated in the respective multiplication of the first execution, since the gadget RRM(1, 2, {{r 1 }, {r 1 }}) is t−NI (Figure [11]e novel randomness-recycling gadgets in the AES cipher is extremely straightforward.Assume a 1st-order secure masked AES cipher that uses the Boyar-Peralta decomposition[11]in the Sbox implementation, i.e. the Sbox requires 32 multiplications in GF (2).
Attack description RRM(d, n, R)