Using Hadamard transform for cryptanalysis of pseudo-random generators in stream ciphers

In this work we discuss results obtained from an application of the Hadamard transform to cryptanalysis, and in particular, we determine the probability to decipher different pseudo-random number generators used as components of stream ciphers. Also, we found a relationship between entropy and Hadamard’s values. Received on 02 March 2020; accepted on 09 April 2020; published on 15 April 2020


Introduction
Cryptography, or cryptology, deals with the practice and study of securing communications between parties. Thus, it is of the greatest importance nowadays, in particular since we live in a society with digital presence for almost all of our activities and assets, to be able to rely on the methods employed to secure such information. There are plenty of applications, such as in electronic commerce [1,2] and blockchain [3][4][5], among others. In particular, stream ciphers, which are the object of our present study are useful for protecting data in real-time, like the encryption and decryption of a DVD [6]. Cryptanalysis is the study of methods to reveal the meaning of encrypted information, without access to the secret key. Typically, this translates into obtaining the key used to encrypt the information. In non-technical terms, this practice is known as breaking or forcing the cryptosystem. Despite the aim being always the same, the methods and techniques of cryptanalysis have changed drastically throughout the history of cryptography, adapting to an increasing cryptographic complexity. The methods of cryptanalysis have also changed as it is no longer possible to have unlimited success in breaking a cryptosystem, and there is a hierarchical classification of what constitutes an attack in practice, see [7]. In * Corresponding author. Email: gsosag@up.edu.mx this work, we develop theoretical statistical attacks by searching for auto correlations in the output bits of stream ciphers. The term cryptanalysis is also used to refer to any attempt to circumvent the security of different types of algorithms and cryptographic protocols in general, and not just encryption [8].
Although the objective has always been the same, i.e. totally breaking the cryptosystem, the methods and techniques of cryptanalysis have changed drastically throughout the history of cryptography, adapting to a growing cryptographic complexity, which ranges from the pen and paper methods of the past, through machines like Enigma to the systems based on modern computers and other electronic devices. The purpose of any pseudo random generator (PRNG) is to obtain sequences that behave statistically as if they were random. Although some design criteria are known, how can you decide when a sequence is sufficiently random? How can unpredictability be measured? To overcome the complexities of modern day cryptography we must resort to advanced mathematics and algorithms. A very useful mathematical technique to reduce the complexity of a problem is that of changing its domain by means of a transformation. A transform represents the change from one domain to another, and with the right properties, it reduces the complexity of the given task. In the context of our problem, the cryptanalysis of stream ciphers, the Hadamard transform will prove to be useful. In the domain of random sequences there are those generated by intrinsically random physical processes, i.e, based on the assumption of the existence of random processes in nature. In many applications it is necessary to have the same sequence (apparently random) in two different experiments, so it is necessary to use ireproducible deterministic algorithms. Sequences produced by such algorithms are called pseudo-random. Pseudo-random sequences are used in various environments related to telecommunications. One of the most important steps in cryptanalysis is to establish correspondences between the elements of the output with probable clear text. The main task is to find ways of facilitating the computation of a probability that relates several terms of an output, and determine the likelihood to decipher the message or parts of it. In this work, we develop theoretical statistical attacks by searching for autocorrelations in the output bits of stream ciphers [9]. The term cryptanalysis is also used to refer to any attempt to circumvent the security of different types of algorithms and cryptographic protocols in general, and not just encryption [8]. Although the objective has always been the same, i.e. totally breaking the cryptosystem, the methods and techniques of cryptanalysis have changed drastically throughout the history of cryptography, adapting to a growing degree of cryptographic complexity, which ranges from the pen and paper methods of the past, through machines like Enigma to the systems based on modern computers and other electronic devices. For a general introduction in the subject, see [10,11]. The purpose of any pseudorandom generator (PRNG) [12] is to obtain sequences that statistically behave as random. Although some design criteria are known, how can one decide whether a sequence is sufficiently random? How can unpredictability be measured? To overcome the complexities of modern day cryptography one must resort to advanced mathematics and algorithms. A very useful mathematical technique to reduce the complexity of a problem is that of changing its domain by means of a transformation. A transform represents the change from one domain to another, and with the right properties, it reduces the complexity of the given task. In the context of our problem, the cryptanalysis of stream ciphers, the Hadamard transform (also known as Walsh-Hadamard transform [13]) will prove to be useful. The structure of the paper is as follows. Section 2 presents the stream ciphers used in the current work Section 3 presents the results related to entropy of the stream ciphers under study. Section 4. Section 5 gives an example of how to apply our proposed method. Finally, Section 8 presents the conclusions of the word and some desirable future work.

Stream ciphers
A stream cipher algorithm is simply specifying a pseudorandom generator, which allows to encrypt messages of arbitrary length by combining the bit sequence produced with the message by exclusive-OR operation, symbol by symbol [14]. When designing stream ciphers, a number of properties must be taken into account besides generating randomly looking bit sequences [15]; the generated sequence must present the most unpredictable behavior possible, i.e., given a fraction of the sequence, it should not be possible to predict the rest, either before or after the given subsequence. More formally, a stream cipher can be viewed as a function f : F n 2 → F m 2 which transforms a binary vector input X = (x 1 , . . . , x n ) of n bits into a binary vector output Y = f (X) = (y 1 , . . . , y m ) of m bits where n, m ∈ N. Denoted as e i with 1 ≤ i ≤ n, the unit vectors e 1 = (1, 0, . . . , 0, 0, 0), . . . , e n = (0, 0, . . . , 0, 0, 1). To detect regularities in binary sequences, it is necessary to resort to probabilistic studies allowing for quantitative evaluation of the randomness of a sequence . Among the 15 statistical tests proposed by the NIST in its battery of tests to evaluate PRNG there is there is the so called approximate entropy test [16]. We will apply this technique as a measurement for our method. In what follows we briefly describe commonly used stream ciphers which we will also use as a way to illustrate our proposed method and obtained results.

Shrinking Generator
The so-called Shrinking Generator(SG) is a nonlinear key stream generator composed by two LFSRs [17] so that a control register SRS decimates the sequence produced by the other register SRA. S and A denote respectively their corresponding lengths and fulfill that (S, A) = 1 and S < A. P S (x), P A (x) ∈ F(2)[x] denote their corresponding primitive characteristic polynomials. The sequence {s i }, produced by SRS, controls the bits of the sequence {a i } produced by SRA which are included in the output shrunken sequence Z according to the following rule: If s i = 1 then z j = a I , and if s i = 0 then a i is discarded. As different pairs of SRA/SRS initial states can generate the same shrunken sequence, in the sequel we assume, without loss of generality, that the first term of the sequence {s i } equals 1, that is s 0 = 1. According to [17], the period of the shrunken sequence is: T = (2 A − 1)2 S−1 ,its linear complexity, notated LC, satisfies the following inequality: A2 S−2 < LC ≤ A2 S−1 . It can be proven [18] that the shrunken sequence has also good distributional statistics. Therefore, due to all these good characteristics, this scheme has been traditionally used as a key stream sequence generator with application in secret-key cryptography. A potential problem is that the shrinking generator outputs bits at an irregular rate, and a timing attack might reveal something about 2 EAI Endorsed Transactions on Energy Web Online First

Entropy of stream ciphers
In information theory, Shannon's entropy or entropy (in honor of Claude E. Shannon)(see [24]) measures the uncertainty of a source of information, that is, entropy is a measure of randomness in information. Cryptographic algorithms are required to introduce high randomness on encrypted messages, with minimum or no dependence between the key and the ciphertext. With high randomness, the relationship between the key and the encrypted text becomes complex. This property is also called confusion. A high degree of confusion is desired to make it difficult to guess the plain text by an attacker. Entropy is thus a reflection of the performance in terms of effectiveness of a cryptographic algorithm. The maximum entropy of a message is reached when all symbols are equally probable. Entropy is computed by means of Shannon's formula where p i is the probability of a given symbol and N is the length of message X. The amount of information associated with the simplest event, consisting of only two equiprobable possibilities, will be the unit of measure for entropy and is called bit. This is why the base 2 logarithm is used in the definition of entropy, so that the amount of information from the simplest event is equal to one. It can be said that the entropy of a random variable is the average number of bits needed to encode each of the states of the variable, assuming that each event is expressed using a message written in a binary alphabet. The entropy of a truly random sequence of length k must be equal to k, in practice, however, source information rarely generates random messages and, in general information entropy value is less than ideal (see for example [25]). However, after messages are encrypted, their entropy should ideally be k. If the output of such encryption emits symbols with entropy less than k, there is a certain degree of predictability, which threatens its security. We conducted entropy tests for A5/1, Blow-fish and Shrinking Generator cryptosystems, results are listed in table ??. It can be observed that as k increases entropy drops from its ideal value as expected and, it does not imply these generators have no practical use. 3 EAI Endorsed Transactions on Energy Web Online First

Hadamard transform
The Hadamard Transform is perhaps the best known of non-sinusoidal orthogonal transformation. The Hadamard Transform has gained prominence in applications in the processing of digital signals [26], because it only uses sums and subtractions to compute. Therefore, its implementation in hardware is very efficient. The Hadamard transform proves also useful as a tool in the construction of q-bent functions [27] , and vectorial bent functions [28]. Let V n be the vector space of dimension n over the binary Galois field F 2 . For two vectors a,b ∈ F n 2 , we define the scalar product a · b = (a 1 b 1 ⊕ . . . ⊕ a n b n ) and the sum a ⊕ b = (a 1 ⊕ b 1 , . . . , a n ⊕ b n ), where the product and the sum ⊕ (called XOR) are over F 2 . We recommend the articles of Bernasconi et al. [29] and Pommerening [30] for more on this topic. (1) It can be verified that a direct calculation of the complete Hadamard spectrum using previous definition implies a complexity of N 2 steps, with N = 2 n . However, there is a quick procedure to calculate the Hadamard transform that can be computed with only N log(N ) steps, using the concept of a butterfly diagram(see [31]).

Hadamard transform in cryptanalysis
Cryptanalysis is the study of methods to reveal the meaning of encrypted information, without access to the secret key. Typically, this translates into obtaining the key used to encrypt the information. In nontechnical terms, this practice is known as breaking or forcing the cryptosystem; we will not discuss the details. Despite that the aim has always been the same, the methods and techniques of cryptanalysis have changed drastically throughout the history of cryptography, adapting to an increasing cryptographic complexity. The methods of cryptanalysis have also changed as it is no longer possible to have unlimited success in breaking a cryptosystem, and there is a hierarchical classification of what constitutes an attack in practice (see [7]). In the domain of random sequences there are those generated by intrinsically random physical processes, i.e, based on the assumption of the existence of random processes in nature. In many applications it is necessary to have the same sequence (apparently random) in two different experiments, so it is necessary to use ireproducible deterministic algorithms. Sequences produced by such algorithms are called pseudo-random. Pseudo-random sequences are used in various environments related to telecommunications. One of the most important steps in cryptanalysis is to establish correspondences between the elements of the output with probable clear text. The main task is to find ways of facilitating the computation of a probability that relates several terms of an output, and determine the likelihood to decipher the message or parts of it. Let Z = {z 1 , z 2 , . . . , z p } be the output of a binary generator, and z j w = (z j+wi−w ) k i=1 ∈ V k a rolling sequence with a window of size w of Z for all j ∈ J = {1, . . . , p − wk + w} and ξ = (ξ 1 , . . . , ξ k ) ∈ V k for a fixed k. In our analysis, we want to find the probability P {z j w · ξ = 0} for ξ 0. Our main goal is the computation of this probability, and thus we start by defining n 0 =| {j ∈ J : z j w · ξ = 0} |, and n 1 =| {j ∈ J : z j w · ξ = 1} |, such that n 0 + n 1 = N := p − wk + w. Then, by a classical result in probability, it follows that for all j ∈ J. This probability depends of ξ. Now, before we proceed to apply certain transformations, we let (3) Then, The problem is, given ∆ ξ , to find the values of n 0 and n 1 , which leads to a Diophantine equation that due to the conditions of the previous variables makes it necessary to resort to resources such as Kronecker's Theorem, which makes the solution of the problem very complicated. The idea then is to use a transform that as it is known represents the change from one domain to another and that due to its properties, it reduces the complexity of mathematical problems. This type of tool has been very useful and fundamental in solving problems in different fields, and of diverse nature. The Hadamard transform allows us to solve the problem posed. Thus, we transform the problem using the Hadamard Transform. It turns out that by finding a relationship between ∆ ξ , n 0 and n 1 : Then, 1+∆ ξ 2 = n 0 N . The probability that we want to find is reducing the problem to that of finding ∆ ξ .

EAI Endorsed Transactions on Energy Web Online First
Using Hadamard transform for cryptanalysis of pseudo-random generators in stream ciphers

Hadamard transform of a sequence
Suppose now that we obtain an output sequence Z, we want to count the repetitions of z j w , for this we build the set α = {α l ∈ V k }, for l = 1, . . . , 2 k and we use the Alg. 1 . Then we compute the following for all ξ ∈ V k . Each z j w is mapped to y j ∈ N such that for all j ∈ J, then we let Π[y j ] = Π[y j ] + 1, this is a counter for the number of y j 's such that n l = Π[l]. Now letn ξ = α l ∈α (−1) α l ·ξ n l . Then to compute {n ξ , ξ ∈ V k } and solve (3), we will use the Discrete Hadamard Transform. If we write n l = f (x) and to (h, x) = (α l , ξ), then the transform can be written as arriving to which is a less cumbersome computation of ∆ ξ . Once ∆ ξ is obtained, we can determine if there are distinguished elements within V k that can help us characterize certain parts of the output sequence Z or in this case by finding a certain relationship between the elements of the sequence.

Example
In this example, we illustrate our methods by analyzing the A5/1 described in the section 2.2. We use Key: 1223456789ABCDEF. We organize the output Z sequentially in blocks of length k = 12 and w = 12, and interpret these blocks as the sequences z j w . It has therefore length p = 16380. Each z j w is the binary expression of an integer in the interval [0, 2 12 − 1]. We store their frequencies in vector Π of length 2 12 ; i.e., Π i = |{r|i = 11 j=0 z 12r+j 2 j , r = 0, . . . , 16380}|. We now resort to Hadamard matrices to transform our problem. Recall that we may construct Hadamard matrices to use Paley construction(see [32]) In our example, N = p/k = 1365 and define δ := H 12 ·Π N and δ 0 := 0. In this experiment we find that max(δ) = 0.115, corresponding to the 2195-th entry. Now 2195 10 = 100010010011 2 , and with probability 1+δ 2195 2 = 0.5575 which implies that, with high probability, 2195 is a distinguished element from the output. Indeed, we may verify this by computing N i=0 (−1) z 12i +z 12i+4 +z 12i+8 +z 12i+10 +z 12i+11 N = 0.115 = δ 2195 , which tells us that our computation is correct.

Hadamard values for A5/1, Blow-fish and SG
Using the example given in the previous section as a prototype, we tested our method with other well known generators such as A5/1, Blow-fish and SG. Table 2 summarizes the Hadamard values computed using 1000 rounds for each generator. Our method is able to identify distinguished elements on each generator for different values of k.

Correlation analysis between entropy and Hadamard values
We proceeded to apply Pearson's p-test to determine correlation coefficients for entropy and Hadamard values. summarizes our findings. In all cases a strong correlation was obtained. It can be observe that the entropy values and the Hadamard values are strongly correlated. As the step size increases, the entropy value decreases and Hadamard values increase. No further exploration of this data will be conducted in the present work. 5 EAI Endorsed Transactions on Energy Web Online First

Conclusions
We presented a method for the cryptanalysis of pseudorandom generators in a stream cipher. The method relies on the discrete Hadamard transform to map the problem from a statistical setting to a probabilistic domain, in which it is possible, with high probability, to single out elements of the output and pair them with their corresponding clear texts. We included a complete example as a prototype to illustrate the simplicity of our method and its accuracy. We summarized statistical results for the computation of Hadamard values for A5/1 and Blow-fish generators. We included results of a correlation analysis for entropy and Hadamard values indicating a strong correlation which we will explore in a future work.