Keywords

1 Introduction

Lightweight cryptography studies the deployment of cryptographic primitives in resource-constrained environments. This research direction is driven by a demand for cost-effective, small-scale communicating devices such as RFID tags that are a cornerstone in the Internet of Things. Most often the constrained resource is taken to be the chip-area but other performance metrics such as latency [7], code-size [2] and ease of side-channel protection [12] have been considered as well. Some of these criteria were already treated in Noekeon [9].

The increased importance of lightweight cryptography and its applications has lately been reflected in the NSA publishing two dedicated lightweight cipher families: Simon and Speck [5]. Considering that this is only the third time within four decades that the NSA has published a block cipher, this is quite remarkable. Especially as NIST has started shortly after this publication to investigate the possibilities to standardise lightweight primitives, Simon and Speck certainly deserve a thorough investigation. This is emphasised by the fact that, in contrast to common practice, neither a security analysis nor a justification of the daaesign choices were published by the NSA. This lack of openness necessarily gives rise to curiosity and caution.

In this paper we focus on the Simon family of block ciphers; an elegant, innovative and very efficient set of block ciphers. There exists already a large variety of papers, mainly focussed on evaluating Simon’s security with regard to linear and differential cryptanalysis. Most of the methods used therein are rather ad-hoc, often only using approximative values for the differential round probability and in particular for the linear square correlation of one round.

Our Contribution. With this study, we complement the existing work threefold. Firstly we develop an exact closed form expression for the differential probability and a \(\log (n)\) algorithm for determining the square correlation over one round. Their accuracy is proven rigorously. Secondly we use these expressions to implement a model of differential and linear characteristics for SAT/SMT solvers which allows us to find the provably best characteristics for different instantiations of Simon. Furthermore we are able to shed light on how differentials in Simon profit from the collapse of many differential characteristics. Thirdly by generalising the probability expressions and the SAT/SMT model, we are able to compare the quality of different parameter sets with respect to differential and linear cryptanalysis.

As a basis for our goal to understand both the security of Simon as well as the choice of its parameter set, we rigorously derive formulas for the differential probabilities and the linear square correlations of the Simon-like round function that can be evaluated in constant time and time linear in the word size respectively. More precisely, we study differential probabilities and linear correlations of functions of the form

$$\begin{aligned} S^a(x) \odot S^b(x) + S^c(x)\end{aligned}$$

where \(S^i(x)\) corresponds to a cyclic left shift of x and \(\odot \) denotes the bitwise AND operation.

We achieve this goal by first simplifying this question by considering equivalent descriptions both of the round function as well as the whole cipher (cf. Sect. 2.4). These simplifications, together with the theory of quadratic boolean functions, result in a clearer analysis of linear and differential properties (cf. Sects. 3 and 4). Importantly, the derived simple equations for computing the probabilities of the Simon round function can be evaluated efficiently and, more importantly maybe, are conceptually very easy. This allows them to be easily used in computer-aided investigations of differential and linear properties over more rounds. It should be noted here that the expression for linear approximations is more complex than the expression for the differential case. However, with respect to the running time of the computer-aided investigations this difference is negligible.

We used this to implement a framework based on SAT/SMT solvers to find the provably best differential and linear characteristics for various instantiations of Simon (cf. Sect. 5, in particular Table 1). Furthermore we are able to shed light on how differentials in Simon profit from the collapse of many differential characteristics by giving exact distributions of the probabilities of these characteristics for chosen differentials. The framework is open source and publicly available to encourage further research [13].

In Sect. 6 we apply the developed theory and tools to investigate the design space of Simon-like functions. In particular, using the computer-aided approach, we find that the standard Simon parameters are not optimal with regard to the best differential and linear characteristics.

As a side result, we improve the probabilities for the best known differentials for several variants and rounds of Simon. While this might well lead to (slightly) improved attacks, those improved attacks are out of the scope of our work.

Interestingly, at least for Simon32 our findings indicate that the choices made by the NSA are good but not optimal under our metrics, leaving room for further investigations and questions. To encourage further research, we propose several alternative parameter choices for Simon32. Here, we are using the parameters that are optimal when restricting the criteria to linear, differential and dependency properties. We encourage further research on those alternative choices to shed more light on the undisclosed design criteria.

We also like to point out that the Simon key-scheduling was not part of our investigations. Its influence on the security of Simon is left as an important open question for further investigations. In line with this, whenever we investigate multi-round properties of Simon in our work, we implicitly assume independent round keys in the computation of probabilities.

Finally, we note that most of our results can be applied to more general constructions, where the involved operations are restricted to AND, XOR, and rotations.

Related Work. There are various papers published on the cryptanalysis of Simon [1, 3, 6, 1719]. The most promising attacks so far are based on differential and linear cryptanalysis, however a clear methodology of how to derive the differential probabilities and square correlations seems to miss in most cases. Biryukov, Roy and Velichkov [6] derive a correct, but rather involved method to find the differential probabilities. Abed, List, Lucks and Wenzel [1] state an algorithm for the calculation of the differential probabilities but without further explanation. For the calculation of the square correlations an algorithm seems to be missing all together.

Previous work also identifies various properties like the strong differential effect and give estimate of the probability of differentials.

The concept behind our framework was previously also applied on the ARX cipher Salsa20 [14] and the CAESAR candidate NORX [4]. In addition to the applications proposed in previous work we extend it for linear cryptanalysis, examine the influence of rotation constants and use it to compute the distribution of characteristics corresponding to a differential.

2 Preliminaries

In this section, we start by defining our notation and giving a short description of the round function. We recall suitable notions of equivalence of Boolean functions that allow us to simplify our investigations of Simon-like round functions. Most of this section is generally applicable to AND-RX constructions, i.e. constructions that only make use of the bitwise operations AND, XOR, and rotations.

2.1 Notation

We denote by \({{\mathrm{\mathbb {F}}}}_2\) the field with two elements and by \({{\mathrm{\mathbb {F}}}}_2^n\) the n-dimensional vector space over \({{\mathrm{\mathbb {F}}}}_2\). By \({{\mathrm{\mathbf {0}}}}\) and \({{\mathrm{\mathbf {1}}}}\) we denote the vectors of \({{\mathrm{\mathbb {F}}}}_2^n\) with all 0s and all 1s respectively. The Hamming weight of a vector \(a\in {{\mathrm{\mathbb {F}}}}_2^n\) is denoted as \({{\mathrm{wt}}}(a)\). By \({{\mathrm{\mathbb {Z}}}}_n\) we denote the integers modulo n.

The addition in \({{\mathrm{\mathbb {F}}}}_2^n\), i.e. bit-wise XOR, is denoted by \(+\). By \(\odot \) we denote the AND operation in \({{\mathrm{\mathbb {F}}}}_2^n\), i.e. multiplication over \({{\mathrm{\mathbb {F}}}}_2\) in each coordinate:

$$\begin{aligned} x \odot y =(x_iy_i)_i.\end{aligned}$$

By \(\vee \) we denote the bitwise OR operation. By \(\overline{x}\) we denote the bitwise negation of x, i.e. \(\overline{x} := (x + {{\mathrm{\mathbf {1}}}})\). We denote by \(S^i:{{\mathrm{\mathbb {F}}}}_2^n \rightarrow {{\mathrm{\mathbb {F}}}}_2^n\) the left circular shift by i positions. We also note that any arithmetic of bit indices is always done modulo the word size n.

In this paper we are mainly concerned with functions of the form

$$\begin{aligned} f_{a,b,c}(x)= & {} S^a(x) \odot S^b(x) + S^c(x) \end{aligned}$$
(1)

and we identify such functions with its triple (abc) of parameters.

For a vectorial Boolean function on n bits, \(f:{{\mathrm{\mathbb {F}}}}_2^n \rightarrow {{\mathrm{\mathbb {F}}}}_2^n\), we denote by

$$\begin{aligned} {\widehat{f}}(\alpha ,\beta ) = \sum _x \mu \left( \langle \beta , f\rangle +\langle \alpha , x \rangle \right) \end{aligned}$$

the Walsh (or Fourier) coefficient with input mask \(\alpha \) and output mask \(\beta \). Here we use \(\mu (x)=(-1)^x\) to simplify the notation.

The corresponding squared correlation of f is given by

$$\begin{aligned} C^2(\alpha \rightarrow \beta ) = \left( \frac{ {\widehat{f}}(\alpha ,\beta )}{2^n}\right) ^2.\end{aligned}$$

For differentials we similarly denote by \(\Pr (\alpha \xrightarrow []{~~} \beta )\) the probability that a given input difference \(\alpha \) results in a given output difference \(\beta \), i.e.

$$\begin{aligned} \Pr (\alpha \xrightarrow []{~~} \beta ) = \frac{ | \{ x \ | \ f(x)+f(x+\alpha ) = \beta \} | }{2^n}. \end{aligned}$$

Furthermore, \(\mathsf {Dom}(f)\) is the domain of a function f, \(\mathsf {Img}(f)\) is its image.

2.2 Description of SIMON

Simon is a family of lightweight block ciphers with block sizes 32, 48, 64, 96, and 128 bits. The constructions are Feistel ciphers using a word size n of 16, 24, 32, 48 or 64 bits, respectively. We will denote the variants as Simon2n. The key size varies between of 2, 3, and 4 n-bit words. The round function of Simon is composed of AND, rotation, and XOR operations on the complete word (see Fig. 1). More precisely, the round function in Simon corresponds to

$$\begin{aligned} S^8(x) \odot S^1(x) + S^2(x),\end{aligned}$$

that is to the parameters (8, 1, 2) for f as given in Eq. (1). As we are not only interested in the original Simon parameters, but in investigating the entire design space of Simon-like functions, we denote by

$$\begin{aligned} \textsc {Simon}[a, b, c] \end{aligned}$$

the variant of Simon where the original round function is replaced by \(f_{a,b,c}\) (cf. Eq. (1)).

As it is out of scope for our purpose, we refer to [5] for the description of the key-scheduling.

Fig. 1.
figure 1

The round function of Simon

2.3 Affine Equivalence of Boolean Functions

Given two (vectorial) Boolean functions \(f_1\) and \(f_2\) on \({{\mathrm{\mathbb {F}}}}_2^n\) related by

$$\begin{aligned} f_1(x)=(A \circ f_2 \circ B)(x) + C(x)\end{aligned}$$

where A and B are affine permutations and C is an arbitrary affine mapping on \({{\mathrm{\mathbb {F}}}}_2^n\) we say that \(f_1\) and \(f_2\) are extended affine equivalent (cf. [8] for a comprehensive survey).

With respect to differential cryptanalysis, if \(f_1\) and \(f_2\) are extended affine equivalent then the differential \(\alpha \xrightarrow []{f_1} \beta \) over \(f_1\) has probability p if and only if the differential

$$\begin{aligned}B(\alpha ) \xrightarrow []{f_2} A^{-1}\left( \beta + C(\alpha )\right) \end{aligned}$$

over \(f_2\) has probability p as well.

For linear cryptanalysis, a similar relation holds for the linear correlation. If \(f_1\) and \(f_2\) are related as defined above, it holds that

$$\begin{aligned} {\widehat{f}}_1(\alpha ,\beta )={\widehat{f}}_2\left( \left( C\circ B^{-1}\right) ^{T} \beta + \left( B^{-1}\right) ^{T}\alpha ,A^{T} \beta \right) .\end{aligned}$$

Thus up to linear changes we can study \(f_2\) instead of \(f_1\) directly. Note that, for an actual attack, these changes are usually critical and can certainly not be ignored. However, tracing the changes is, again, simple linear algebra.

For differential and linear properties of Simon-like functions of the form

$$\begin{aligned}f_{a,b,c}(x) = S^a(x) \odot S^b(x) + S^c(x)\end{aligned}$$

this implies that it is sufficient to find the differential and linear properties of the simplified variant

$$\begin{aligned}f(x) = x \odot S^{d}(x)\end{aligned}$$

and then transfer the results back by simply using linear algebra.Footnote 1

2.4 Structural Equivalence Classes in AND-RX Constructions

AND-RX constructions, i.e. constructions that make only use of the operations AND (\(\odot \)), XOR (\(+\)), and rotations (\(S^r\)), exhibit a high degree of symmetry. Not only are they invariant under rotation of all input words, output words and constants, they are furthermore structurally invariant under any affine transformation of the bit-indices. As a consequence of this, several equivalent representations of the Simon variants exist.

Let T be a permutation of the bits of an n-bit word that corresponds to an affine transformation of the bit-indices. Thus there are \(s\in {{\mathrm{\mathbb {Z}}}}_n^*\) and \(t\in {{\mathrm{\mathbb {Z}}}}_n\) such that bit i is renamed to \(s\cdot i + t\). As the AND and XOR operations are bitwise, T clearly commutes with these:

$$\begin{aligned} Tv \odot Tw&= T(v\odot w)\\ Tv + Tw&= T(v+ w) \end{aligned}$$

where v and w are n-bit words. A rotation to the left by r can be written bitwise as \( S^r(v)_i = v_{i-r}\). For a rotation, we thus get the following bitwise relation after transformation with T

$$\begin{aligned}S^r(v)_{s\cdot i+t} = v_{s\cdot (i-r)+t} = v_{s\cdot i+t-s\cdot r}.\end{aligned}$$

Substituting \(s\cdot i + t\) with j this is the same as

$$\begin{aligned} S^r(v)_j = v_{j - s\cdot r}.\end{aligned}$$

Thus the rotation by r has been changed to a rotation by \(s\cdot r\). Thus we can write

$$\begin{aligned} TS^rv = S^{s\cdot r} T v.\end{aligned}$$

Commuting the linear transformation of the bit-indices with a rotation thus only changes the rotation constant by a factor. In the special case where all input words, output words and constants are rotated, which corresponds to the case \(s=1\), the rotation constant are left untouched.

To summarize the above, when applying such a transformation T to all input words, output words and constants in an AND-RX construction, the structure of the constructions remains untouched apart from a multiplication of the rotation constants by the factor s.

This means for example for Simon32 that changing the rotation constants from (8, 1, 2) to \((3\cdot 8, 3\cdot 1, 3\cdot 2) = (8,3,6)\) and adapting the key schedule accordingly gives us the same cipher apart from a bit permutation. As s has to be coprime to n, all s with \(\gcd (s,n)=1\) are allowed, giving \(\varphi (n)\) equivalent tuples of rotation constants in each equivalence class where \(\varphi \) is Euler’s phi function.

Together with the result from Sect. 2.3, this implies the following lemma.

Lemma 1

Any function \(f_{a,b,c}\) as defined in Eq. (1) is extended affine equivalent to a function

$$\begin{aligned} f(x) = x \odot S^{d} (x) \end{aligned}$$

where d|n or \(d=0\).

When looking at differential and square correlations of Simon-like round functions this means that it is sufficient to investigate this restricted set of functions. The results for these functions can then simply be transferred to the general case.

3 Differential Probabilities of SIMON-like Round Functions

In this section, we derive a closed expression for the differential probability for all Simon-like round functions, i.e. all functions as described in Eq. (1). The main ingredients here are the derived equivalences and the observation that any such function is quadratic. Being quadratic immediately implies that its derivative is linear and thus the computation of differential probabilities basically boils down to linear algebra (cf. Theorem 1). However, to be able to efficiently study multiple-round properties and in particular differential characteristics, it is important to have a simple expression for the differential probabilities. Those expressions are given for \(f(x)=x\odot S^1(x)\) in Theorem 2 and for the general case in Theorem 3.

3.1 A Closed Expression for the Differential Probability

The following statement summarises the differential properties of the f function.

Theorem 1

Given an input difference \(\alpha \) and an output difference \(\beta \) the probability p of the corresponding differential (characteristic) for the function \(f(x) = x \odot S^a(x)\) is given by

$$\begin{aligned} p_{\alpha ,\beta }= {\left\{ \begin{array}{ll} 2^{-(n-d)} &{} \text {if } \beta +\alpha \odot S^a(\alpha ) \in \mathsf {Img}(L_{\alpha }) \\ 0 &{} \text {else } \end{array}\right. }\end{aligned}$$

where

$$\begin{aligned} d=\dim \ker (L_{\alpha }) \end{aligned}$$

and

$$\begin{aligned} L_{\alpha }(x)= x\odot S^a(\alpha ) + \alpha \odot S^a(x)\end{aligned}$$

Proof

We have to count the number of solutions to the equation

$$\begin{aligned} f(x) + f(x+\alpha ) = \beta .\end{aligned}$$

This simplifies to

$$\begin{aligned} L_{\alpha }(x)=x\odot S^a(\alpha ) + \alpha \odot S^a(x) = \beta + \alpha \odot S^a(\alpha )\end{aligned}$$

As this is an affine equation, it either has zero solutions or the number of solutions equals the kernel size, i.e. the number of elements in the subspace

$$\begin{aligned} \{ x \ | \ x\odot S^a(\alpha ) + \alpha \odot S^a(x) = {{\mathrm{\mathbf {0}}}} \}.\end{aligned}$$

Clearly, the equation has solutions if and only if \(\beta + \alpha \odot S^a(\alpha )\) is in the image of \(L_{\alpha }\). \(\square \)

Next we present a closed formula to calculate the differential probability in the case where \(a=1\). Furthermore we restrict ourselves to the case where n is even.

Theorem 2

Let

$$\begin{aligned}\mathtt {varibits} = S^1(\alpha ) \vee \alpha \end{aligned}$$

and

$$\begin{aligned}\mathtt {doublebits} = \alpha \odot \overline{S^1(\alpha )} \odot S^{2}(\alpha ).\end{aligned}$$

Then the probability that difference \(\alpha \) goes to difference \(\beta \) is

$$\begin{aligned} P(\alpha \rightarrow \beta ) = {\left\{ \begin{array}{ll} 2^{-n+1} &{} \text {if } \alpha = {{\mathrm{\mathbf {1}}}} \text { and } {{\mathrm{wt}}}(\beta ) \equiv 0 \mod 2 \\ 2^{-{{\mathrm{wt}}}(\mathtt {varibits} + \mathtt {doublebits})} &{} \text {if } \alpha \ne {{\mathrm{\mathbf {1}}}} \text { and } \beta \odot \overline{\mathtt {varibits}} = {{\mathrm{\mathbf {0}}}} \\ &{} \text {and } (\beta + S^{1}(\beta )) \odot \mathtt {doublebits} = {{\mathrm{\mathbf {0}}}} \\ 0 &{} \text {else} \end{array}\right. } \end{aligned}$$

Proof

According to Theorem 1, we need to prove two things. Firstly we need to prove that the rank of \(L_\alpha \) (i.e. \(n-\dim \ker L_\alpha \)) is \(n-1\) when \(\alpha ={{\mathrm{\mathbf {1}}}}\), and \({{\mathrm{wt}}}(\mathtt {varibits} + \mathtt {doublebits})\) otherwise. Secondly we need to prove that \( \beta +\alpha \odot S^1(\alpha ) \in \mathsf {Img}(L_{\alpha })\) iff \({{\mathrm{wt}}}(\beta ) \equiv 0 \mod 2\) when \(\alpha = {{\mathrm{\mathbf {1}}}}\), and that \( \beta +\alpha \odot S^1(\alpha ) \in \mathsf {Img}(L_{\alpha })\) iff \(\beta \odot \mathtt {varibits} = {{\mathrm{\mathbf {0}}}} \text { and } (\beta + S^{1}(\beta )) \odot \mathtt {doublebits} = {{\mathrm{\mathbf {0}}}}\) when \(\alpha \ne {{\mathrm{\mathbf {1}}}}\).

We first consider the first part. Let us write \(L_\alpha (x)\) in matrix form and let us take x to be a column vector. \(S^1(\alpha )\odot x\) can be written as \(M_{S^1(\alpha )\odot }x\) with

$$\begin{aligned} M_{S^1(\alpha )\odot } = \begin{pmatrix} \alpha _{n-1} &{} \dots &{} \dots &{} 0 \\ \vdots &{} \alpha _{0} &{} &{} \vdots \\ \vdots &{} &{} \ddots &{} \vdots \\ 0 &{} \dots &{} \dots &{} \alpha _{n-2} \end{pmatrix}. \end{aligned}$$
(2)

Equivalently we can write \(\alpha \odot x\) and \(S^1(x)\) with matrices as \(M_{\alpha \odot }x\) and \(M_{S^1}x\) respectively where

$$\begin{aligned} M_{\alpha \odot } = \begin{pmatrix} \alpha _{0} &{} \dots &{} \dots &{} 0 \\ \vdots &{} \alpha _{1} &{} &{} \vdots \\ \vdots &{} &{} \ddots &{} \vdots \\ 0 &{} \dots &{} \dots &{} \alpha _{n-1} \end{pmatrix} \quad \text {and}\quad M_{S^1} = \begin{pmatrix} 0_{1, n-1} &{} I_{1, 1} \\ I_{{n-1, n-1}} &{} 0_{n-1, 1} \end{pmatrix}, \end{aligned}$$
(3)

i.e. \(M_{S^1}\) consists of two identity and two zero submatrices. The result of \(M_{S^1(\alpha )\odot } + M_{\alpha \odot }M_{S^1}\) can now be written as

$$\begin{aligned} \begin{pmatrix} \alpha _{n-1} &{} 0 &{} 0 &{} \dots &{} \alpha _0 \\ \alpha _{1} &{} ~\alpha _{0}~ &{} 0 &{} \dots &{} 0 \\ 0 &{} \alpha _{2} &{} ~\alpha _{1}~ &{} &{} \vdots \\ \vdots &{} &{} \ddots &{} \ddots &{} 0 \\ 0 &{} \dots &{} 0 &{} \alpha _{n-1} &{} \alpha _{n-2} \end{pmatrix} \end{aligned}$$
(4)

Clearly the rank of the matrix is \(n-1\) when all \(\alpha _i\) are 1. Suppose now that not all \(\alpha _i\) are 1. In that case, a set of non-zero rows is linearly dependent iff there exist two identical rows in the set. Thus to calculate the rank of the matrix, we need to calculate the number of unique non-zero rows.

By associating the rows in the above matrix with the bits in \(\mathtt {varibits}\), we can clearly see that the number of non-zero rows in the matrices corresponds to the number of 1s in \(\mathtt {varibits} = S^1(\alpha ) \vee \alpha \).

To count the number of non-unique rows, first notice that a nonzero row can only be identical to the row exactly above or below. Suppose now that a non-zero row i is identical to the row \((i-1)\) above. Then \(\alpha _{i-1}\) has to be 0 while \(\alpha _i\) and \(\alpha _{i-2}\) have to be 1. But then row i cannot simultaneously be identical to row \((i+1)\) below. Thus it is sufficient to calculate the number of non-zero rows minus the number of rows that are identical to the row above it to find the rank of the matrix. Noting that row i is non-zero iff \(\alpha _i \alpha _{i-1}\) and that \(\alpha _i \overline{\alpha _{i-1}} \alpha _{i-2}\) is only equal 1 when row i is non-zero and equal to the row above it. Thus calculating the number of i for which

$$\begin{aligned} \alpha _i \alpha _{i-1} + \alpha _i \overline{\alpha _{i-1}} \alpha _{i-2}\end{aligned}$$

is equal 1 gives us the rank of \(L_\alpha \). This corresponds to calculating \({{\mathrm{wt}}}(\mathtt {varibits} + \mathtt {doublebits})\).

For the second part of the proof, we need to prove the conditions that check whether \(\beta +\alpha \odot S^1(\alpha ) \in \mathsf {Img}(L_{\alpha })\). First notice that \(\alpha \odot S^1(\alpha )\) is in the image of \(L_\alpha \) (consider for x the vector with bits alternately set to 0 and 1). Thus it is sufficient to test whether \(\beta \) is in \(\mathsf {Img}L_\alpha \). Let \(y = L_\alpha (x)\). Let us first look at the case of \(\alpha = {{\mathrm{\mathbf {1}}}}\). Then \(L_\alpha (x) = x+S^1(x)\). We can thus deduce from bit \(y_i\) whether \(x_i=x_{i-1}\) or \(x_i\ne x_{i-1}\). Thus the bits in y create a chain of equalities/inequalities in the bits of x which can only be fulfilled if there the number of inequalities is even. Hence in that case \(\beta \in \mathsf {Img}L_\alpha \) iff \({{\mathrm{wt}}}(\beta ) \equiv 0 \mod 2\).

For the case that \(\alpha \ne {{\mathrm{\mathbf {1}}}}\), we first note that \(y_i\) has to be zero if the corresponding row i in the matrix of Eq. (4) is all zeroes. Furthermore following our discussion of this matrix earlier, we see that \(y_i\) is independent of the rest of y if the corresponding row is linearly independent of the other rows and that \(y_i\) has to be the same as \(y_{i-1}\) if the corresponding rows are identical. Thus we only need to check that the zero-rows of the matrix correspond to zero bits in \(\beta \) and that the bits in \(\beta \) which correspond to identical rows in the matrix are equal. Thus \(\beta \) is in the image of \(L_\alpha \) iff \(\beta \odot \overline{\mathtt {varibits}} = {{\mathrm{\mathbf {0}}}}\) and \((\beta + S^{1}(\beta )) \odot \mathtt {doublebits} = {{\mathrm{\mathbf {0}}}}\). \(\square \)

3.2 The Full Formula for Differentials

Above we treated only the case for the simplified function \(f(x)=x\cdot S^1(x)\). As mentioned earlier, the general case where \(\gcd (a-b,n) = 1\) can be deduced from this with linear algebra. When \(\gcd (d,n) \ne 1\) though, the function \(f(x) = x \odot S^d(x)\) partitions the output bits into independent classes. This not only raises differential probabilities (worst case \(d=0\)), it also makes the notation for the formulas more complex and cumbersome, though not difficult. We thus restrict ourselves to the most important case when \(\gcd (a-b,n) = 1\). The general formulas are then

Theorem 3

Let \(f(x) = S^a(x) \odot S^b(x) + S^c(x)\), where \(\gcd (n,a-b)=1\), n even, and \(a > b\) and let \(\alpha \) and \(\beta \) be an input and an output difference. Then with

$$\begin{aligned}\mathtt {varibits} = S^a(\alpha ) \vee S^b(\alpha )\end{aligned}$$

and

$$\begin{aligned}\mathtt {doublebits} = S^b(\alpha ) \odot \overline{S^a(\alpha )} \odot S^{2a-b}(\alpha )\end{aligned}$$

and

$$\begin{aligned}\gamma = \beta + S^c(\alpha ) \end{aligned}$$

we have that the probability that difference \(\alpha \) goes to difference \(\beta \) is

$$\begin{aligned} P(\alpha \rightarrow \beta ) = {\left\{ \begin{array}{ll} 2^{-n+1} &{} \text {if } \alpha = {{\mathrm{\mathbf {1}}}} \text { and } {{\mathrm{wt}}}(\gamma ) \equiv 0 \mod 2 \\ 2^{-{{\mathrm{wt}}}(\mathtt {varibits} + \mathtt {doublebits})} &{} \text {if } \alpha \ne {{\mathrm{\mathbf {1}}}} \text { and } \gamma \odot \overline{\mathtt {varibits}} = {{\mathrm{\mathbf {0}}}} \\ &{} \text {and } (\gamma + S^{a-b}(\gamma )) \odot \mathtt {doublebits} = {{\mathrm{\mathbf {0}}}} \\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

For a more intuitive approach and some elaboration on the differential probabilities, we refer to the ePrint version of this paper.

4 Linear Correlations of SIMON-like Round Functions

As in the differential case, for the study of linear approximations, we also build up on the results from Sects. 2.3 and 2.4. We will thus start with studying linear approximations for the function \(f(x) = x \odot S^a(x)\). Again, the key point here is that all those functions are quadratic and thus their Fourier coefficient, or equivalently their correlation, can be computed by linear algebra (cf. Theorem 4). Theorem 5 is then, in analogy to the differential case, the explicit expression for the linear correlations. It basically corresponds to an explicit formula for the dimension of the involved subspace.

The first result is the following:

Theorem 4

$$\begin{aligned}{\widehat{f}}(\alpha ,\beta )^2 = {\left\{ \begin{array}{ll} 2^{n+d} &{} \text {if } \alpha \in U_{\beta }^{\perp } \\ 0 &{} \text {else } \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} d=\dim U_{\beta } \end{aligned}$$

and

$$\begin{aligned} U_{\beta }=\{ y ~|~ \beta \odot S^a(y) + S^{-a}(\beta \odot y) = {{\mathrm{\mathbf {0}}}} \} \end{aligned}$$

Proof

We compute

$$\begin{aligned} {\widehat{f}}(\alpha ,\beta )^2= & {} \sum _{x,y} \mu \left( \langle \beta , f(x) + f(y) \rangle +\langle \alpha , x+ y \rangle \right) \\= & {} \sum _{x,y} \mu \left( \langle \beta , f(x) + f(x+ y) \rangle +\langle \alpha , y \rangle \right) \\= & {} \sum _{x,y} \mu \left( \langle \beta , x \odot S^a(x)+ (x+ y) \odot S^a(x+ y) \rangle +\langle \alpha , y \rangle \right) \\= & {} \sum _{y} \mu \left( \langle \beta , f(y) \rangle +\langle \alpha , y \rangle \right) \sum _x \mu \left( \langle \beta , x \odot S^a(y)+ y \odot S^a(x) \rangle \right) \\= & {} \sum _{y} \mu \left( \langle \beta , f(y) \rangle +\langle \alpha , y \rangle \right) \sum _x \mu \left( \langle x, \beta \odot S^a(y) + S^{-a}(\beta \odot y) \rangle \right) . \end{aligned}$$

Now for the sum over x only two outcomes are possible, \(2^n\) or zero. More precisely, it holds that

$$\begin{aligned} \sum _x \mu \left( \langle x, \beta \odot S^a(y) + S^{-a}(\beta \odot y) \rangle \right) = {\left\{ \begin{array}{ll} 2^n &{} \text {if } \beta \odot S^a(y) + S^{-a}(\beta \odot y) = {{\mathrm{\mathbf {0}}}} \\ 0 &{} \text {else.} \end{array}\right. }\end{aligned}$$

Thus, defining

$$\begin{aligned} U_{\beta }=\{ y ~|~ \beta \odot S^a(y) + S^{-a}(\beta \odot y) = {{\mathrm{\mathbf {0}}}} \} \end{aligned}$$

we get

$$\begin{aligned}{\widehat{f}}(\alpha ,\beta )^2 = 2^n \sum _{y\in U_{\beta }} \mu \left( \langle \beta , f(y) \rangle +\langle \alpha , y \rangle \right) .\end{aligned}$$

Now as

$$\begin{aligned} \langle \beta , f(y) \rangle =&\langle \beta , y\odot S^a(y) \rangle \end{aligned}$$
(5)
$$\begin{aligned} =&\langle {{\mathrm{\mathbf {1}}}}, y\odot \beta \odot S^a(y) \rangle \end{aligned}$$
(6)
$$\begin{aligned} =&\langle {{\mathrm{\mathbf {1}}}}, y\odot S^{-a}(\beta \odot y) \rangle \end{aligned}$$
(7)

Now, the function \(f_{\beta }:=\langle \beta ,f(y)\rangle \) is linear over \(U_\beta \) as can be easily seen by the definition of \(U_\beta \). Moreover, as \(f_{\beta }\) is unbalanced for all \(\beta \), it follows that actually \(f_{\beta }\) is constant zero on \(U_{\beta }\). We thus conclude that

$$\begin{aligned}{\widehat{f}}(\alpha ,\beta )^2 = 2^n \sum _{y\in U_{\beta }} \mu \left( \langle \alpha , y \rangle \right) .\end{aligned}$$

With a similar argument as above, it follows that \({\widehat{f}}(\alpha ,\beta )^2\) is non-zero if and only if \(\alpha \) is contained in \(U_{\beta }^{\perp }\). \(\square \)

Let us now restrict ourselves to the case where \(f(x) = x\odot S^1(x)\). The general case can be deduced analogously to the differential probabilities. For simplicity we also restrict ourselves to the case where n is even.

First we need to introduce some notation. Let \(x\in {{\mathrm{\mathbb {F}}}}_2^n\) with not all bits equal to 1. We now look at blocks of consecutive 1s in x, including potentially a block that “wraps around” the ends of x. Let the lengths of these blocks, measured in bits, be denoted as \(c_0,\dots , c_m\). For example, the bitstring 100101111011 has blocks of length 1, 3, and 4. With this notation define \(\theta (x) := \sum \limits _{i=0}^{m} \lceil \frac{c_i}{2} \rceil .\)

Noting that the linear square correlation of f is \(\frac{{\widehat{f}}(\alpha ,\beta )^2}{2^{2n}}\), we then have the following theorem:

Theorem 5

With the notation from above it holds that the linear square correlation of \(\alpha \mathop {\rightarrow }\limits ^{f} \beta \) can be calculated as

$$\begin{aligned} C(\alpha \rightarrow \beta ) = {\left\{ \begin{array}{ll} 2^{-n+2} &{} \text {if } \beta = {{\mathrm{\mathbf {1}}}} \text { and } \alpha \in U_\beta ^\perp \\ 2^{-\theta (\beta )} &{} \text {if } \beta \ne {{\mathrm{\mathbf {1}}}} \text { and } \alpha \in U_\beta ^\perp \\ 0 &{} \text {else.} \end{array}\right. }\end{aligned}$$

Proof

Define \(L_\beta (x) :=\beta \odot S^1(x) + S^{-1}(\beta \odot x)\). Clearly \(L_\beta \) is linear. Also \(U_\beta = \ker L_\beta (x)\). Let us determine the rank of this mapping. Define the matrices \(M_{\beta \cdot }\), \(M_{S^1}\), and \(M_{S^{-1}}\) as

$$\begin{aligned} M_{\beta \cdot } = \begin{pmatrix} \beta _{0} &{} \dots &{} \dots &{} 0 \\ \vdots &{} \beta _{1} &{} &{} \vdots \\ \vdots &{} &{} \ddots &{} \vdots \\ 0 &{} \dots &{} \dots &{} \beta _{n-1} \end{pmatrix} \quad \begin{matrix} M_{S^1} = \begin{pmatrix} 0_{1, n-1} &{} I_{1, 1} \\ I_{{n-1, n-1}} &{} 0_{n-1, 1} \end{pmatrix} \\ \\ M_{S^{-1}}= \begin{pmatrix} 0_{n-1, 1} &{} I_{n-1, n-1} \\ I_{1, 1} &{} 0_{1, n-1} \end{pmatrix} \end{matrix} \end{aligned}$$
(8)

We can then write \(L_\beta \) in matrix form as

$$\begin{aligned} \begin{pmatrix} 0 &{} \beta _{1} &{} 0 &{} \dots &{} 0 &{} \beta _0 \\ \beta _{1} &{} 0 &{} \beta _{2} &{} 0 &{} \dots &{} 0 \\ 0 &{} \beta _{2} &{} 0 &{} \beta _{3} &{} \ddots &{} \vdots \\ \vdots &{} &{} \ddots &{} \ddots &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{} \ddots &{} 0 &{} \beta _{n-1} \\ \beta _{0} &{} 0 &{} \dots &{} 0 &{} \beta _{n-1} &{} 0 \end{pmatrix} \end{aligned}$$
(9)

Clearly, if \(\beta \) is all 1s, the rank of the matrix is \(n-2\) as n is even.Footnote 2 Let us therefore now assume that \(\beta \) is not all 1s. When we look at a block of 1s in \(\beta \) e.g., \(\beta _{i-1}=0\), \(\beta _{i}, \beta _{i+1},\dots , \beta _{i+l-1}=1\), and \(\beta _l=0\). Then clearly the l rows \(i,i+1,\dots ,i+l-1\) are linearly independent when l is even. When l is odd though, the sum of rows i, \(i+2\), \(i+4\), up to row \(i+l-3\) will equal row \(i+l-1\). In that case there are thus only \(l-1\) linearly independent rows. As the blocks of 1s in \(\beta \) generate independent blocks of rows, we can summarise that the rank of the matrix is exactly \(\theta (\beta )\). \(\square \)

Analogously to the differential probabilities, the linear probabilities in the general case can be derived from this. It is likewise straightforward to derive how to determine whether \(\alpha \in U^\perp _\beta \). As an explicit formulation of this is rather tedious, we instead refer to the implementation in Python given in the Appendix A where both is achieved in the case where \(\gcd (a-b,n) = 1\) and n is even.

For a more intuitive approach and some elaboration on the linear probabilities, we refer to the ePrint version of this paper.

5 Finding Optimal Differential and Linear Characteristics

While there are various methods for finding good characteristics, determining optimal differential or linear characteristics remains a hard problem in general. The formulas derived for both differential and linear probabilities enable us to apply an algebraic approach to finding the best characteristics. A similar technique has been applied to the ARX cipher Salsa20 [14] and the CAESAR candidate NORX [4]. For finding the optimal characteristics for Simon we implemented an open source tool [13] based on the SAT/SMT solvers CryptoMiniSat [15] and STP [11].

In the next section we will show how Simon can be modeled to find both the best differential and linear characteristics in this framework and how this can be used to solve cryptanalytic problems.

5.1 Model for Differential Cryptanalysis of SIMON

First we define the variables used in the model of Simon. We use two n-bit variables \(x_i\), \(y_i\) to represent the XOR-difference in the left and right halves of the state for each round and an additional variable \(z_i\) to store the XOR-difference of the output of the AND operation.

For representing the \(\log _2\) probability of the characteristic we introduce an additional variable \(w_i\) for each round. The sum over all probabilities \(w_i\) then gives the probability of the corresponding differential characteristic. The values \(w_i\) are computed according to Theorem 3 as

$$\begin{aligned} w_i = {{\mathrm{wt}}}(\mathtt{varibits } + \mathtt{doublebits }) \end{aligned}$$
(10)

where \({{\mathrm{wt}}}(x)\) is the Hamming weight of x and

$$\begin{aligned} \mathtt {varibits}&= S^a(x_i) \vee S^b(x_i)\\ \mathtt {doublebits}&= S^b(x_i) \odot \overline{S^a(x_i)} \wedge S^{2a-b}(x_i) \end{aligned}$$

Therefore, for one round of Simon we get the following set of constraints:

$$\begin{aligned} \begin{alignedat}{1} y_{i+1}&= x_i\\ 0&= (z_i \odot \mathtt {varibits})\\ 0&= (z_i + S^{a-b}(z_i)) \odot \mathtt {doublebits}\\ x_{i+1}&= y_i + z_i + S^c(x_i)\\ w_i&= {{\mathrm{wt}}}(\mathtt {varibits} + \mathtt {doublebits}) \end{alignedat} \end{aligned}$$
(11)

A model for linear characteristics, though slightly more complex, can be implemented in a similar way. A description of this model can be found in the implementation of our framework. Despite the increase in complexity, we could not observe any significant impact on the solving time for the linear model.

5.2 Finding Optimal Characteristics

We can now use the previous model for Simon to search for optimal differential characteristics. This is done by formulating the problem of finding a valid characteristic, with respect to our constraints, for a given probability w. This is important to limit the search space and makes sense as we are usually more interested in differential characteristics with a high probability as they are more promising to lead to attacks with a lower complexity. Therefore, we start with a high probability and check if such a characteristic exists. If not we lower the probability.

The procedure can be described in the following way:

  • For each round of the cipher add the corresponding constraints as defined in (11). This system of constraints then exactly describes the form of a valid characteristic for the given parameters.

  • Add a condition which accumulates the probabilities of each round as defined in (10) and check if it is equal to our target probability w.

  • Query if there exists an assignment of variables which is satisfiable under the constraints.

  • Decrement the probability w and repeat the procedure.

One of the main advantages compared to other approaches is that we can prove an upper bound on the probability of characteristics for a given cipher and number of rounds. If the solvers determines the set of conditions unsatisfiable, we know that no characteristic with the specified probability exists. We used this approach to determine the characteristics with optimal probability for different variants of Simon. The results are given in Table 1.

Table 1. Overview of the optimal differential (on top) and linear characteristics for different variants of Simon. The probabilities are given as \(\log _2(p)\), for linear characteristic the squared correlation is used.

Upper Bound for the Characteristics. During our experiments we observed that it seems to be an easy problem for the SMT/SAT solver to prove the absence of differential characteristics above \(w_{\text {max}}\). This can be used to get a lower bound on the probability of characteristics contributing to the differential. The procedure is similar to finding the optimal characteristics.

  • Start with a very low initial probability \(w_i\).

  • Add the same system of constraints which were used for finding the characteristic.

  • Add a constraint fixing the variables \((x_0, y_0)\) to \(\varDelta _{\text {in}}\) and \((x_r, y_r)\) to \(\varDelta _{\text {out}}\).

  • Query if there is a solution for this weight.

  • Increase the probability \(w_i\) and repeat the procedure until a solution is found.

5.3 Computing the Probability of a Differential

Given a differential characteristic it is of interest to determine the probability of the associated differential \(\Pr (\varDelta _{\text {in}} \xrightarrow {f^r} \varDelta _{\text {out}})\) as it might potentially have a much higher probability then the single characteristic. It is often assumed that the probability of the best characteristic can be used to approximate the probability of the best differential. However, this assumption only gives an inaccurate estimate in the case of Simon.

Similarly to the previous approach for finding the characteristic, we can formalise the problem of finding the probability of a given differential in the following way:

  • Add the same system of constraints which were used for finding the characteristic.

  • Add a constraint fixing the variables \((x_0, y_0)\) to \(\varDelta _{\text {in}}\) and \((x_r, y_r)\) to \(\varDelta _{\text {out}}\).

  • Use a SAT solver to find all solutions \(s_i\) for the probability w.

  • Decrement the probability w and repeat the procedure.

The probability of the differential is then given by

$$\begin{aligned} \Pr (\varDelta _{\text {in}} \xrightarrow {f^r} \varDelta _{\text {out}}) = \sum \limits _{i=w_{\text {min}}}^{w_{\text {max}}} s_i \cdot 2^{-i} \end{aligned}$$
(12)

where \(s_i\) is the number of characteristics with a probability of \(2^{-i}\).

We used this approach to compute better estimates for the probability of various differentials (see Table 2). In the case of Simon32 we were able to find all characteristics contributing to the differentials for 13 and 14 rounds. The distribution of the characteristics and accumulated probability of the differential is given in Fig. 2. It is interesting to see that the distribution of w in the range [55, 89] is close to uniform and therefore the probability of the corresponding differential improves only negligible and converges quickly towards the measured probabilityFootnote 3.

The performance of the whole process is very competitive compared to dedicated approaches. Enumerating all characteristics up to probability \(2^{-46}\) for the 13-round Simon32 differential takes around 90 seconds on a single CPU core and already gives a better estimate compared to the results in [6]. A complete enumeration of all characteristics for 13-round Simon32 took close to one core month using CryptoMiniSat4 [15]. The computational effort for other variants of Simon is comparable given the same number of rounds. However, for these variants we can use differentials with a lower probability covering more rounds due to the increased block size. In this case the running time increases due to the larger interval \([w_{\text {min}}, w_{\text {max}}]\) and higher number of rounds.

For Simon48 and Simon64 we are able to improve the estimate given in [16]. Additionally we found differentials which can cover 17 rounds for Simon48 and 22 rounds for Simon64 which might have potential to improve previous attacks. Our results are also closer to the experimentally obtained estimates given in [10] but give a slightly lower probability. This can be due to the limited number of characteristics we use for the larger Simon variants or the different assumptions on the independence of rounds.

Our results are limited by the available computing power and in general it seems to be difficult to count all characteristics for weights in \([w_{\text {min}}, w_{\text {max}}]\), especially for the larger variants of Simon. However the whole process is embarrassingly parallel, as one can split up the computation for each probability \(w_i\). Furthermore, the improvement that one gets decreases quickly. For all differentials we observed that the distribution of differential characteristics becomes flat after a certain point.

Fig. 2.
figure 2

Distribution of the number of characteristics for the differential \((0, 40) \rightarrow (4000, 0)\) for 13-round Simon32 and the accumulated probability. A total of \(\approx 2^{25.21}\) characteristics contribute to the probability.

Table 2. Overview of the differentials and the range \([w_{\text {min}},w_{\text {max}}]\) of the \(\log _2\) probabilities of the characteristics contributing to the differential. For computing the lower bound \(\log _2(p)\) of the probability of the differentials, we used all characteristics with probabilities in the range from \(w_{\text {min}}\) up to the values in brackets in the \(w_{\text {max}}\) column.

6 Analysis of the Parameter Choices

The designers of Simon so far gave no justification for their choice of the rotation constants. Here we evaluate the space of rotation parameters with regard to different metrics for the quality of the parameters. Our results are certainly not a definite answer but are rather intended as a starting point to evaluating the design space and reverse engineering the design choices. We consider all possible sets of rotation constants (abc)Footnote 4 and checked them for diffusion properties and the optimal differential and linear characteristics.

6.1 Diffusion

As a very simple measure to estimate the quality of the rotation constants, we measure the number of rounds that are needed to reach full diffusion. Full diffusion is reached when every state bit principally depends on all input bits. Compared to computing linear and differential properties it is an easy task to determine the dependency.

In Table 3 we give a comparison to how well the standard Simon rotation parameters fare within the distribution of all possible parameter sets. The exact distributions for all Simon variants can be found in the appendix in Table 8.

Table 3. The number of rounds after which full diffusion is reached for the standard Simon parameters in comparison to the whole possible set of parameters.

6.2 Differential and Linear

As a second criteria for our parameters, we computed for all \(a > b\) and \(\gcd (a-b, n) = 1\) the optimal differential and linear characteristics for 10 rounds of Simon32, Simon48 and Simon64. A list of the parameters which are optimal for all three variants of Simon can be found in Appendix C.

It is important here to note that there are also many parameter sets, including the standard choice, for which the best 10-round characteristics of Simon32 have a probability of \(2^{-25}\) compared to the optimum of \(2^{-26}\). However, this difference by a factor of 2 does not seem to occur for more than 10 rounds and also not any larger variants of Simon.

6.3 Interesting Alternative Parameter Sets

As one result of our investigation we chose three exemplary sets of parameters that surpass the standard parameters with regards to some metrics. Those variants are Simon[12, 5, 3], Simon[7, 0, 2] and Simon[1, 0, 2].

Simon[12, 5, 3] has the best diffusion amongst the parameters which have optimal differential and linear characteristics for 10 rounds. The two other choices are both restricted by setting \(b = 0\) as this would allow a more efficient implementation in software. Among those Simon[7, 0, 2] has the best diffusion and the characteristics behave similar to the standard parameters. Ignoring the diffusion Simon[1, 0, 2] seems also an interesting choice as it is optimal for the differential and linear characteristics.

If we look though at the differential corresponding to the best differential characteristic of Simon[7, 0, 2] and Simon[1, 0, 2], then we can see the number of characteristics contributing to it is significantly higher than for the standard parameters (see Appendix Table 6). However, for Simon[12, 5, 3] the differential shows a surprisingly different behaviour and the probability of the differential is much closer to the probability of the characteristic. On the other side, the characteristics seem to be worse for the larger variants as can be seen in Table 7. Furthermore it might be desirable to have at least one rotation parameter that corresponds to a byte length, something that the standard parameter set features.

7 Conclusion and Future Work

In this work we analysed the general class of functions underlying the Simon block cipher. First we rigorously derived efficiently computable and easily implementable expressions for the exact differential and linear behaviour of Simon-like round functions.

Building upon this, we used those expressions for a computer aided approach based on SAT/SMT solvers to find both optimal differential and linear characteristics for Simon. Furthermore, we were able to find all characteristics contributing to the probability of a differential for Simon32 and gave better estimates for the probability for other variants.

Finally, we investigated the space of Simon variants using different rotation constants with respect to diffusion, and the optimal differential and linear characteristics. Interestingly, the default parameters seem to be not always optimal.

This work opens up for further investigations. In particular, the choice and justifications of the NSA parameters for Simon remains unclear. Besides our first progress concerning the round function, the design of the key schedule remains largely unclear and further investigation is needed here.