Deep Learning-Based Cryptanalysis of Lightweight Block Ciphers

Most of the traditional cryptanalytic technologies often require a great amount of time, known plaintexts, andmemory.+is paper proposes a generic cryptanalysis model based on deep learning (DL), where the model tries to find the key of block ciphers from known plaintext-ciphertext pairs. We show the feasibility of the DL-based cryptanalysis by attacking on lightweight block ciphers such as simplified DES, Simon, and Speck. +e results show that the DL-based cryptanalysis can successfully recover the key bits when the keyspace is restricted to 64 ASCII characters. +e traditional cryptanalysis is generally performed without the keyspace restriction, but only reduced-round variants of Simon and Speck are successfully attacked. Although a text-based key is applied, the proposed DL-based cryptanalysis can successfully break the full rounds of Simon32/64 and Speck32/64. +e results indicate that the DL technology can be a useful tool for the cryptanalysis of block ciphers when the keyspace is restricted.


Introduction
Cryptanalysis of block ciphers has persistently received great attention. In particular, recently, many cryptanalytic techniques have emerged. e cryptanalysis based on the algorithm of algebraic structures can be categorized as follows: a differential cryptanalysis, a linear cryptanalysis, a differential-linear cryptanalysis, a meet-in-the-middle (MITM) attack, and a related-key attack [1,2]. Differential cryptanalysis, which is the first general cryptanalytic technique, analyses how differences evolve during encryption and how differences of plaintext pairs evolve to differences of the resultant ciphertext pairs [3]. e differential cryptanalysis has evolved to various types of differential cryptanalysis such as an integral cryptanalysis, which is sometimes known as a multiset attack, a boomerang attack, an impossible differential cryptanalysis, and an improbable differential cryptanalysis [1,2]. Linear cryptanalysis is also a general cryptanalytic technique, where it analyses linear approximations between plaintexts bits, ciphertexts bits, and key bits. It is a known plaintext attack. e work in [4] showed that the efficiency of the linear cryptanalysis can be improved by use of chosen plaintexts. e authors in [5] proposed a zero-correlation linear cryptanalysis, which is a key recovery technique. e MITM attack, which employs a space-time tradeoff, is a generic attack which weakens the security benefits of using multiple encryptions [6]. e biclique attack, which is a variant of the MITM attack, utilizes a biclique structure to extend the number of possibly attacked rounds by the MITM attack [6]. In a related-key attack, an attacker can observe the operation of a cipher under several different keys whose values are initially unknown, but where some mathematical relationship connecting the keys is known to the attacker [7].
However, the conventional cryptanalysis might be impractical or have limitations to be generalized. First, most of conventional cryptanalytic technologies often require a great amount of time, known plaintexts, and memory. Second, although the traditional cryptanalysis is generally performed without the keyspace restriction, only reduced-round variants are successfully attacked on recent block ciphers. For example, no successful attack on the full-round Simon or the full-round Speck, which is a family of lightweight block ciphers, is known [8][9][10]. ird, we need an automated and generalized test tool for checking the safety of various lightweight block ciphers for Internet of ings [11]. ere are various automated techniques that can be used to build distinguishers against block ciphers [12][13][14]. Because resistance against differential cryptanalysis is an important design criterion for modern block ciphers, most designs rely on finding some upper bound on probability of differential characteristics [12]. e authors in [13] proposed a truncated searching algorithm which identifies differential characteristics as well as high probability differential paths. e authors in [14] applied a mixed integer linear programming (MILP) to search for differential characteristics and linear approximations in ARX ciphers. However, most automated techniques have endeavoured to search for differential characteristics and linear approximations. Hence, the machine learning-(ML-) based cryptanalysis can be a candidate to solve the above problems.
is paper proposes a generic deep learning-(DL-) based cryptanalysis model that finds the key from known plaintext-ciphertext pairs and shows the feasibility of the DLbased cryptanalysis by applying it to lightweight block ciphers. Specifically, we try to utilize deep neural networks (DNNs) to find the key from known plaintexts. e contribution of this paper is two-fold: first, we develop a generic and automated cryptanalysis model based on the DL. e proposed DL-based cryptanalysis is a promising step towards a more efficient and automated test for checking the safety of emerging lightweight block ciphers. Second, we perform the DL-based attacks on lightweight block ciphers, such as S-DES, Simon, and Speck. In our knowledge, this is the first attempt to successfully break the full rounds of Simon32/64 and Speck32/64 although we apply the textbased key for the block ciphers. e remainder of this paper is organized as follows: Section 2 presents the related work; Section 3 describes the attack model for cryptanalysis; Section 4 introduces the DLbased approach for the cryptanalysis of lightweight block ciphers and presents the structure of the DNN model; Section 5 describes how to learn and evaluate the model; in Section 6, we apply the DL-based cryptanalysis to lightweight block ciphers and evaluate the performance of the DL-based cryptanalysis; finally, Section 7 concludes this paper.
Notations: we give some notations, which will be used in the rest of this paper. A plaintext and ciphertext are, respectively, denoted by p � (p 0 , p 1 , . . ., p n−1 ) and c � (c 0 , c 1 , . . ., c n−1 ), where n is the block size, p i is the ith bit of the plaintext, c i is the ith bit of the ciphertext, and p i , , where m is the key length and k i is the ith bit of the key, k i ∈ 0, 1 { }. Let k| j i denote the key bits from the ith bit to the jth bit of the key, that is, k| A block cipher is specified by an encryption function, E(p, k), that is, c � E(p, k).

Related Work
ML has been successfully applied in a wide range of areas with significant performance improvement, including computer vision, natural language processing, speech, and game [15]. e development of ML technologies provides a new development direction for cryptanalysis [16]. e idea of the relationship between the fields of cryptography and ML is introduced in [17] at 1991. After that, many researchers have endeavoured to apply the ML technologies for the cryptanalysis of block ciphers. e studies on the ML-based cryptanalysis can be classified as follows: first, some studies focused on finding the characteristics of block ciphers by using ML technologies. e authors in [18] used a recurrent neural network to find the differential characteristics of block ciphers, where the recurrent neural network represents the substitution functions of a block cipher. e author in [19] applied an artificial neural network to automate attacks on the classical ciphers of a Caesar cipher, a Vigenère cipher, and a substitution cipher, by exploiting known statistical weakness. ey trained a neural network to recover the key by providing the relative frequencies of ciphertext letters. Recent work [20] experimentally showed that a CipherGAN, which is a tool based on a generative adversarial network (GAN), can crack language data enciphered using shift and Vigenère ciphers.
Second, some studies used ML technologies to classify encrypted traffic or to identify the cryptographic algorithm from ciphertexts. In [21], an ML-based traffic classification was introduced to identify SSH and Skype encrypted traffic. e authors in [22] constructed three ML-based classification protocols to classify encrypted data. ey showed the three protocols, hyperplane decision, Naïve Bayes, and decision trees, efficiently perform a classification when running on real medication data sets. e authors in [23] used a support vector machine (SVM) technique to identify five block cryptographic algorithms, AES, Blowfish, 3DES, RC5, and DES, from ciphertexts. e authors in [24] proposed an unsupervised learning cost function for a sequence classifier without labelled data, and they showed how it can be applied in order to break the Caesar cipher.
ird, other researchers have endeavoured to find out the mapping relationship between plaintexts, ciphertexts, and the key, but there are few scientific publications. e work in [25] reported astonishing results for attacking the DES and the Triple DES, where a neural network was used to find the plaintexts from the ciphertexts. e authors in [26] used a neural network to find out the mapping relationship between plaintexts, ciphertexts, and the key in simplified DES (S-DES). e author in [27] developed a feedforward neural network that discovers the plaintext from the ciphertext without the key in the AES cipher. e authors in [28] attacked on the round-reduced Speck32/64 by using deep residual neural networks, where they trained the neural networks to distinguish the output of Speck with a given input difference based on the chosen plaintext attack. e attack in [28] is similar to the classical differential cryptanalysis. However, the previous work failed to attack the full rounds of lightweight block ciphers, and moreover, they failed to develop a generic deep learning-(DL-) based cryptanalysis model.

System Model
We consider (n, m) lightweight block ciphers such as S-DES, Simon, and Speck, where n is the block size and m is the key length. Our objective is to find the key, k, in which the attacker has access to M pairs, [p (i) , c (j) ], of known plaintexts, and their resultant ciphertexts encrypted with the same key, that is, c (j) � E(p (j) , k), j � 1, 2, . . ., M. Hence, the cryptanalytic model is a known plaintext attack model. Because the algorithms of block ciphers have been publicly released, we assume that the algorithms of block ciphers are known.

DNN Learning Framework.
e modern term "DL" is considered as a better principle of learning multiple levels of composition, which uses multiple layers to progressively extract higher level features from the raw input [29]. In the DL area, a DNN is considered as one of the most popular generative models. As a multilayer processor, the DNN is capable of dealing with many nonconvex and nonlinear problems. e feedforward neural network forms a chain, and thus, the feedforward neural network can be expressed as where x is the input, the parameter θ consists of the weights W and the biases b, f (l) is called the lth layer of the network, and L is the number of hidden layers. Each layer of the network consists of multiple neurons, each of which has an output that is a nonlinear function of a weighted sum of neurons of its preceding layer. e output of the jth neuron at the lth layer can be expressed as where w (l) ij is the weight corresponding to the output of the ith neuron at the preceding layer and b (l) j is the bias. We apply a DNN to find the key of lightweight block ciphers. e multilayer perception mechanism and special training policy promote the DNN to be a commendable tool to find affine approximations to the action of a cipher algorithm. We train the DNN by using N r pairs of (p, c) randomly generated with different keys in order that the system f finds affine approximations to the action of a cipher, as shown in Figure 1. In Figure 1, the loss function can be the mean square error (MSE) between the encryption key, k, and the output of the DNN, k. e performance of the trained DNN is evaluated by using N t pairs randomly generated with different keys. Finally, given M known plaintexts, we find the key by using the trained DNN and the majority decision.

DNN Structure for the Cryptanalysis.
e structure of a DNN model for the cryptanalysis is shown in Figure 2. We consider a ReLU function, f ReLU (x) � max(0, x), as the nonlinear function. e DNN has η l neurons at the lth hidden layer, where l � 1, . . ., L. Each neuron at the input layer associates each bit of the plaintext and ciphertext; that is, the ith neuron represents p i , and the (j + n − 1)th neuron represents c j , where i, j � 0, 1, . . ., n − 1. e number of neurons at the input layer is 2n. Each neuron at the output layer associates each bit of the key; that is, the output of the ith neuron corresponds to k i , where i � 0, 1, . . ., m − 1. Hence, the number of neurons at the output layer is m. e output of the DNN, k, is a cascade of nonlinear transformation of the input data, [p, c], mathematically expressed as where L is the number of hidden layers and θ is the weights of the DNN.

Data Generation.
e ML algorithm learns from data. Hence, we need to generate data set for training and testing the DNN. Because the algorithms of modern block ciphers are publicly released, we can generate N plaintext-ciphertext pairs with different keys, where N � N r + N s , N r is used for training the DNN, and N s is used for testing the DNN. Let

Training Phase.
e goal of our model is to minimize the difference between the output of the DNN and the key. Let X represent the training plaintext-ciphertext pairs [p (j) , c (j) ], and let K represent the training keys e DNN learns the value of the parameter θ that minimizes the loss function, from the training samples, as follows: where because the samples are i.i.d., the MSE loss function can be expressed as follows: where N r is noted as the number of training samples, k (j) i is the ith bit of the key corresponding to the jth sample, and k (j) i is the ith output of the DNN corresponding to the jth sample.

Test Phase.
After training, the performance of the DNN is evaluated in terms of the bit accuracy probability (BAP) of each key bit. Here, the BAP of the ith key bit is the number of the DNN finding the correct ith key bit, divided by the total number of test samples.
Because the output of the DNN is a real number, k i ∈ R, we quantize the output of the DNN into {0, 1}. e quantized output of the DNN can then be expressed as en, the BAP of the ith key bit is given as where N s is the number of test samples. XNOR(a, b) has one if two input values, a and b, are identical, and otherwise, it Security and Communication Networks 3 i is the ith key bit corresponding to the jth test sample, and k (j) i is the quantized output of the DNN with the input of the jth test sample.

Majority Decision When M Plaintexts Are Known.
Assume that we have M plaintext-ciphertext pairs encrypted with the same key. If we have a probability of finding the ith key bit, ρ i , then the attack success probability of finding the ith key bit, which is the probability of a correct majority decision, is given as By using the de Moivre-Laplace theorem, as M grows large, the normal distribution can be used as an approximation to the binomial distribution, as follows: where Φ(z) e − x 2 /2 dz. Hence, in order to find the ith key bit with a success probability greater than or equal to τ, the number of required known plaintexts is

Data Set and Performance Metric.
For the data set, we generate the plaintext as any combination of a random binary digit, that is, p i ∈ rand 0, 1 { }. However, for the encryption key, we consider two methods. e first method is a "random key," where the key has any combination of a random binary digit, that is, Hence, the probability that the ith key bit is one is 0.5 for all i. e other method is a "text key," where the key has any combination of characters. For the simplicity, as shown in Figure 4, the character is one out of 64 ASCII characters, which consists of lowercase and uppercase alphabet characters, 10 digits, and two special characters: T � a, b, . . . , z, A, B, . . . , Z, 0, 1, . . . , 9, ?, @ { } and |T| � 64. Hence, in the text key generation, each eight bits belongs to the set of T, that is, k| 8·i+7 8·i ∈ rand(T), where i � 0, 1, . . . , [m/8] − 1. For example, for a 64-bit key, the key consists of 8 characters. In the text key, the probability that the ith key bit is one depending on the order in each character. Let the occurrence probability denote μ i � max(Pr(k i � 1), Pr(k i � 0)), where Pr(k i � x) is the probability that the ith key bit is x. Figure 5 shows the occurrence probability of the ith key bit μ i . For example, the    Figure 4: Characters used in the text key generation.  Taking the occurrence probability of each key bit into consideration, the performance of finding the ith key bit can be expressed as the deviation as follows: where ρ i is the BAP and μ i is the occurrence probability of the ith key bit. If M known plaintexts is given, the performance of finding the ith key bit is given by , which is the probability of a correct majority decision, is obtained from equation (9).

Simulation Environment.
e performance of the DLbased cryptanalysis is evaluated for the lightweight block ciphers: S-DES, Simon32/64, and Speck32/64, as shown in Table 1.
In order to train the DNN with an acceptable loss rate, it is necessary to expand the network size. Hyperparameters, such as the number of hidden layers, the number of neurons per hidden layer, and the number of epochs, should be tuned in order to minimize a predefined loss function. e traditional way of performing hyperparameter optimization has been grid search or random search. Other hyperparameter optimizations are Bayesian optimization, gradient-based optimization, evolutionary optimization, and population-based training [30,31]. Moreover, automated ML (AutoML) has been proposed to design and train neural networks automatically [30]. In our simulation, by using the data set of Simon32/64 and Speck32/64 ciphers, we simply perform an exhaustive searching to set the number of hidden layers, L, and the number of neurons per hidden layer, η l , through a manually specified subset of the hyperparameter space, L ∈ {3, 5, 7} and η l ∈ {128, 256, 512}. Additionally, to reduce the complexity, we choose a smaller number of hidden layers if the performance difference is not greater than 10 −5 . If the number of epochs is greater than 3000, the error becomes small, and when it reaches 5000, it is sufficiently minimized, so we set the number of epochs is fixed to 5000. Consequently, the parameters used for training the DNN models are as follows: the number of hidden layers is 5, the number of neurons at each hidden layer is 512, and the number of epochs is 5000. We use the adaptive moment (Adam) algorithm for the learning rate optimization of the DNN. e powerful "Tensorflow" is introduced to design and process the DNN. Also, we deploy a GPU-based server, which is equipped with Nvidia GeForce RTX 2080 Ti and its CPU is Intel Core i9-9900K. e implemented DL-based cryptanalysis tool is shown in Figure 6. e GUI was implemented by using PyQt over Python 3.7. e implemented tool provides various combinations of ML architectures, hyperparameters, and training/test samples.

Overview of S-DES.
S-DES, designed for education purposes at 1996, has similar properties and structure as DES but has been simplified to make it easier to perform encryption and decryption [32]. e S-DES has an 8-bit block size and a 10-bit key size. e encryption algorithm involves five functions: an initial permutation (IP); a complex function labelled f K , which involves both permutation and substitution operations and depends on a key input; a simple permutation function that switches the two halves of the data; the function f K again; and finally a permutation function that is the inverse of the initial permutation (IP −1 ). S-DES may be said to have two rounds of the function f K .
Because the length of the key is limited, the brute-force attack, which is known as an exhaustive key search, is available. Some previous work presented an approach for breaking the key using genetic algorithm and particle swarm optimization [33,34], which is concluded that the genetic algorithm is a better approach than the brute force for analysing S-DES.

Test Results.
For training and testing the DNN, we generate N plaintext-ciphertext pairs with different keys, as follows: where k (i) ≠ k (j) for i ≠ j and N � N r + N s . Here, N r is the number of samples for training and N s is the number of samples for testing. In the simulation, we use N r � 50000 and N s � 10000. e plaintext is any combination of a random binary digit, that is, p i ∈ rand 0, 1 { }. We generate the encryption key by using two methods: a random key and a text   Security and Communication Networks key. In the S-DES with a 10-bit key, the text key has any combination of one character and two random binary bits. Figure 7 shows the BAP of the DNN when we apply a random key and a text key. e results show the DL-based cryptanalysis can break the S-DES cipher. When we apply a random key, the key bits, k 1 , k 5 , and k 8 , are quite vulnerable to the attack and the key bit of k 6 is the safest. Because the minimum value of the BAP is ρ min � 0.5389 at the 6th key bit, from equation (10), we need M � 271 known plaintexts to find all the key bits with a probability of 0.9 and we need M � 891 known plaintexts to find all the key bits with a probability of 0.99. When we apply a text key, the BAP becomes high, thanks to the bias of the occurrence probability of each key bit, μ i , as shown in Figure 5. Because the minimum value of the BAP is ρ min � 0.6484 at the 6th key bit, from equation (10), we need M � 19 known plaintexts to find all the key bits with a probability of 0.9 and we need M � 59 known plaintexts to find all the key bits with a probability of 0.99. Figure 8 shows the deviation between the BAP and the occurrence probability of each key bit. Because of the bias of the occurrence probability of each key bit in the text key, we need to eliminate the bias characteristics of each key bit. e DNN shows that the key bits, which are quite vulnerable to the attack, are (k 2 , k 5 , k 8 ) in the text key and (k 1 , k 5 , k 8 ) in the random key. e key bit of k 6 is the safest both in the text key and in the random key.

Overview of Simon and Speck.
Lightweight cryptography is a rapidly evolving and active area, which is driven by the need to provide security or cryptographic measures to resource-constrained devices such as mobile phones, smart   Security and Communication Networks cards, RFID tags, and sensor networks. Simon and Speck is a family of lightweight block ciphers publicly released in 2013 [35,36]. Simon has been optimized for performance in hardware implementations, while Speck has been optimized for software implementations. e Simon block cipher is a balanced Feistel cipher with a u-bit word, and therefore, the block length is n � 2u. e key length, m, is a multiple of u by 2, 3, or 4. Simon supports various combinations of block sizes, key sizes, and number of rounds [35]. In this paper, we consider a Simon32/64 which refers to the cipher operating on a 32-bit plaintext block that uses a 64-bit key. e Speck is an add-rotate-xor (ARX) cipher. e block of the Speck is always two words, but the words may be 16, 24, 32, 48, or 64 bits in size. e corresponding key is 2, 3, or 4 words. Speck also supports various combinations of block sizes, key sizes, and number of rounds [35].
As of 2018, no successful attack on full-round Simon or full-round Speck of any variant is known. e authors in [37] showed differential attacks of up to slightly more than half of the number of rounds for Simon and Speck families of block ciphers. e authors in [38] showed an integral attack on 24-round Simon32/64 with time complexity of 2 63 and the data complexity of 2 32 . e work in [39] showed an improved differential attack on 14-round Speck32/64 with time complexity of 2 63 and the data complexity of 2 31 .

Data Generation.
For training and testing the DNN, we generate N plaintext-ciphertext pairs with different keys, as follows: where j � 0, 1, . . . , N and N � N r + N s . Here, N r is the number of samples for training and N s is the number of samples for testing. e plaintext is any combination of a random binary digit, that is, p i ∈ rand 0, 1 { }. We generated the encryption key by using two methods: a random key and a text key. In the text key, the 64-bit key consists of 8 characters, where each character is one of 64-character set, T. Hence, although the total keyspace is 2 64 , the actual keyspace is reduced to 2 48 . For training, we use N r � 5 × 10 5 samples, and for the test, we use N s � 10 6 samples. Figure 9 shows the BAP of the Simon32/ 64 with a random key in unit of character. e DNN shows that the BAP of each key bit varies randomly with an average of almost 0.5. Moreover, the results vary with each simulation with different hyperparameters.

Test Results.
at is, the DNN failed to attack the Simon32/64 with a random key. Figure 10 shows the BAP and the deviation of the Si-mon32/64 with a text key in unit of character. e BAP of each key bit is almost identical to the occurrence probability of the text key because the DNN learns the characteristics of the training data. However, when we eliminate the bias characteristics of the text key, the DNN shows the positive deviations, which means the DNN can break a Simon32/64 with a text key. For example, from equation (10), we need just M � 215 known plaintexts in order to find the key bit of k 2 with a probability of 0.99. e minimum value of BAPs is 0.51603 at k 3 , which is greater than μ 3 by about ε 3 � 0.00040, except the last bits of each character. Hence, we can find the encryption key with a probability of 0.9 given M ≈ 2 10.58 known plaintexts, and we can find the encryption key with a probability of 0.99 given M ≈ 2 12.34 known plaintexts. Figure 11 shows the BAP of the Speck32/64 with a random key in unit of character. e BAP of each key bit varies randomly with an average of almost 0.5, similar to the results of the Simon32/64. Moreover, the results vary with   at is, the DL-based attacks against the Speck32/64 with a random key have been failed. Figure 12 shows the BAP and the deviation of the Speck32/64 with a text key in unit of character. e DNN shows the positive deviations. at is, the DNN shows the possibility of breaking a Speck32/64 with a text key. e minimum value of BAPs is 0.51607 at k 3 , which is greater than μ 3 by about ε 3 � 0.00044, except the last bits of each character. Hence, we can find the encryption key with a probability of 0.9 given M ≈ 2 10.57 known plaintexts, and we can find the encryption key with a probability of 0.99 given M ≈ 2 12

Conclusions
We developed a DL-based cryptanalysis model and evaluated the performance of the DL-based attack on the S-DES, Simon32/64, and Speck32/64 ciphers.
e DL-based cryptanalysis may successfully find the text-based encryption key of the block ciphers. When a text key is applied, the DL-based attack broke the S-DES cipher with a success probability of 0.9 given 2 8.08 known plaintexts. at is, the DL-based cryptanalysis reduces the search space nearly by a factor of 8. Moreover, when a text key is applied to the block ciphers, the DL-based cryptanalysis finds the linear approximations between the plaintext-ciphertext pairs and the key, and therefore, it successfully broke the full rounds of Simon32/64 and Speck32/64. When a text key is applied, with a success probability of 0.99, the DL-based cryptanalysis finds 56 bits of Simon32/64 with 2 12.34 known plaintexts and 56 bits of Speck32/64 with 2 12.33 known plaintexts, respectively. Because the developed DL-based cryptanalysis framework is generic, it can be applied to attacks on other block ciphers without change.
e drawback of our proposed DL-based cryptanalysis is that the keyspace is restricted to the text-based key. However, although uncommon, a text-based key can be used to encrypt. For example, the login password entered with the keyboard can be text based if the input data are not hashed. Modern cryptographic functions are designed to be very random looking and to be very complex, and therefore, ML can be difficult to find meaningful relationships between the inputs and the outputs if the keyspace is not restricted. Hence, our approach limited the keyspace to only text-based keys, and the proposed DL-based cryptanalysis could successfully break the 32 bit variants of Simon and Speck ciphers. If the keyspace is not limited, the DL-based cryptanalysis failed to attack the block ciphers. In the future, the accuracy of ML will be improved, and the accuracy becomes more precise, thanks to the development of algorithms and hardware. Moreover, advanced data transformation that efficiently maps cryptographic data onto ML data will help the DL-based cryptanalysis to be performed without the keyspace restriction.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.