Linearity in decimation-based generators: an improved cryptanalysis on the shrinking generator

: Decimation-based sequence generators are a class of non-linear cryptographic generators designed to be used in hardware implementations. An inherent characteristic of such generators is that their output sequences are interleaved sequences. This profitable characteristic can be used in the cryptanalysis of those generators. In this work, emphasis is on the most representative decimation-based generator, the shrinking generator, which has been cryptanalyzed just by solving linear equation systems. Compared with previous cryptanalysis, computational complexity and intercepted sequence requirements are dramatically reduced. Although irregularly decimated generators have been conceived and designed as non-linear sequence generators, in practice they can be easily analyzed in terms of simple linear structures.


Introduction
Nowadays stream ciphers are the fastest among the encryption procedures.They are designed to generate, from a short key, a long sequence (keystream sequence) of seemingly random bits.Some well known designs in stream ciphers can be found in [1,2].Typically, a stream cipher consists of a keystream generator whose output sequence is bit-wise XORed with the plaintext (in emission) to obtain the ciphertext or with the ciphertext (in reception) to recover the original plaintext.References [3][4][5] provide a solid introduction to the study of stream ciphers.
There are many proposals of keystream generators that are based on maximal-length Linear Feedback Shift Registers (LFSRs) [6].Such registers are linear structures characterized by their length L, their characteristic polynomial p(x) and their initial state is (currently the key of the cryptosystem).Their output sequences, the so-called PN-sequences, are usually combined in a non-linear way in order to break their linearity and to produce new pseudorandom sequences of cryptographic application.LFSRs with dynamic feedback, clockcontrolled generators, nonlinear lters or irregularly decimated generators are just some of the most popular keystream generators, see above references.
Irregularly decimated generators produce sequences with good cryptographic properties: long periods, right correlation, excellent run distribution, balancedness, simplicity of implementation, etc.The underlying idea of this kind of generators is the irregular decimation of a PN-sequence according to the bits of another one.The result of this decimation is a binary sequence that will be used as keystream sequence in the cryptographic procedure of stream cipher.
Inside the family of irregularly decimated generators, we can enumerate: 1.The shrinking generator proposed by Coppersmith, Krawczyk and Mansour [7] that involves two LFSRs.
2. The self-shrinking generator designed by Meier and Sta elbach [8] involving only one LFSR.
3. The generalized self-shrinking generator proposed by Hu and Xiao [9] that generates a family of binary sequences.4. The modi ed self-shrinking generator, a decimation-based keystream sequence generator, introduced by Kanso in [10] as an improved version of the self-shrinking generator.
In addition, di erent linear structures based in Cellular Automata that model such generators can also be found in the literature [11][12][13].This work focuses on the most representative element in the class of decimation-based sequence generators: the shrinking generator.Taking advantage of the fact that its output sequence is an interleaved sequence, a simple cryptanalytic attack has been developed.The basic ideas of this attack can be generalized to other elements in the same class of generators.
The paper is organized as follows: in Section 2 fundamentals and basic concepts are provided.In Section 3, we introduce some important properties of the shrinking generator that will be used in Section 4 to perform a recovering algorithm for the generated sequence.Section 5 compares the attack here presented with other ones found in the literature.Finally, conclusions in Section 6 end the paper.

Preliminaries
Notation and basic concepts are now introduced.First of all, we introduce the concept of decimation, which will be used repeatedly throughout this paper.Let {u i } (i = , , , . . . ) be a linear recursive sequence over a nite eld.The decimation of the sequence {u i } by d is a new sequence obtained by taking every d-th term of {u i } [14].
Next, the de nition of interleaved sequence is provided [15].
De nition 2.1.Let g(x) be a polynomial of degree r over GF(q) (the Galois eld of q elements) and let n be a positive integer.For any sequence w = {w k } over GF(q), write k = i n + j (i = , , , . . .j = , . . ., n − ).If all the subsequences w j = {w i n+j } i≥ (j = , . . ., n − ) are generated by g(x), then w is called an interleaved sequence over GF(q) of size n associated with g(x).
We can write w = (w , w , . . ., w n− ) where each w j (j = , . . ., n − ) is a subsequence of w.In fact, each w j is an n-decimation of the sequence w obtained from such a sequence by taking one out of n terms.In the sequel, GF(q) will be the binary eld GF( ).
The shrinking generator (SG) was rst introduced in [7].It is made up of two maximal-length LFSRs denoted by R and R .Let L and L (L < L ) be the LFSR lengths, the primitive polynomials p (x), p (x) their characteristic polynomials, and is and is their initial states, respectively.Moreover, let {a i } and {b i } be the PN-sequences generated by R and R , respectively.In this case, the sequence {a i } decimates the other sequence {b i }.The decimation rule is very simple: given two bits a i and b i , the output sequence of the generator {s k } is computed as We call the sequence {s k } as the shrunken sequence (SS).Assume that gcd(L , L ) = , then the period of SS is T = L − ( L − ).The linear complexity of a sequence, denoted by LC, is de ned as the length of the minimum LFSR that generates such a sequence.As gcd(L , L ) = , then the linear complexity of the shrunken sequence is given by L L − < LC ≤ L L − .Moreover, its characteristic polynomial is of the form p(x) m where p(x) is a primitive polynomial of degree L and m an integer satisfying L − < m ≤ L − .As usual, the key of this generator is the initial state of the both registers R and R .Next a simple illustrative example is introduced.
The shrunken-sequence can be computed as follows: The shrunken sequence {s k } has period and it is easy to check that its characteristic polynomial is p(x) = ( + x + x ) , consequently its linear complexity equals 6.

Linear properties of the shrunken sequence
In this section, we highlight some properties of the shrunken sequence, which will be used in the algorithm proposed in Section 4. As before, we consider two LFSRs R and R with lengths L and L , characteristic polynomials p (x), p (x) and initial states is and is .In addition, T = L − and T = L − are the periods of their corresponding PN-sequences {a i } and {b i }, respectively.
According to De nition 2.1, the shrunken sequence s = {s k } can be written as s = {s , s , . . ., s n− } where n = L − .In fact, every subsequence s j (j = , . . ., n − ) is a PN-sequence generated by the L -degree primitive polynomial p(x) de ned as where e i = i ⋅ T mod T and α is a root of the polynomial p (x). Recall that every subsequence s j is just a decimation of {b i } by d = L − , thus the resulting sequence is a PN-sequence too.In brief, e i (i = , . . ., L − ) are the elements of the cyclotomic coset L − and p(x) is the polynomial associated with such a coset [6].The subsequences s j (j = , . . ., n − ) are called the interleaved PN-sequences of the shrunken sequence.
Example 3.1.Consider two LFSRs R and R with lengths L = and L = , characteristic polynomials p (x) = + x + x and p (x) = + x + x and initial states is = ( , , ) and is = ( , , , ), respectively.The shrunken sequence has period T = and its characteristic polynomial is p(x) = ( + x + x ) .Since the shrunken sequence is an interleaved sequence, it is composed of PN-sequences: All of them have the same characteristic polynomial p(x) = + x + x , thus there is a unique PN-sequence but shifted.This shift depends on the positions of the s in the PN-sequence {a i }.
Let {i , i , . . ., i L − − } denote the position of the L − ones in the PN-sequence {a i } and let δ be an integer such that ( L − )δ = mod ( L − ).Let also d j (j = , , . . ., L − − ) be the position over s of the rst element of each subsequence s j (j = , . . ., n − ), respectively.If we know such positions d j over s , then we can compute the indices i j by means of the following expressions: In Example 3.1, we had four interleaved subsequences s , s , s and s .It is easy to check that d = , d = and d = .In this case, T = and T = , then δ = .With this information, we can determine the position of the ones in {a i } (i = , without loss of generality): Therefore, the set of indices is given by { , , , } and the PN-sequence {a i } is given by { , , , , , , }.
In the algorithm proposed in Section 4, the opposite situation occurs.In that case, we know the position of the ones in the PN-sequence {a i } and we compute the position of the rst element of each subsequence s j in s by means of the expressions given in (1).
The presence of PN-sequences inside the shrunken sequence reveals severe dependencies among its bits.These linear relationships will be advantageously used in the proposed attack.In fact, given N intercepted bits of this sequence, the goal is to determine the pair of initial states (is , is ) of both registers.

Cryptanalytic attack
Prior to the attack's description, the following notation is introduced: is = (a , a , . . ., a L − ), is = (b , b , . . ., b L − ) -S = {s , s , . . ., s N− } are the N intercepted bits of the shrunken sequence.Currently, the number N can be written as N = N + N where N bits are used to compute the pair (is , is ) while N bits are used to check the correctness of the previous pair.
The N intercepted bits are elements of any interleaved PN-sequence s j .Nevertheless, in this attack we only focus on the rst interleaved PN-sequence s .For simplicity it will be denoted by {u i } (i = , , . . ., L − ).According to the properties of the PN-sequences, any term u k of {u i } can be expressed as a function of the rst L bits (u , u , . . ., u L − ) by means of the modular expression where q(x) = c L − x L − + . . .+ c x + c with c i ∈ GF( ).Thus, This cryptanalytic attack is based on solving systems of linear equations of the form: where A is an (N × L ) binary coe cient matrix, x is the (L × ) vector of unknowns and b is the (L × ) right side vector of intercepted bits.Each initial state is parametrises the coe cient matrix A, then the Linear Consistency Test (LCT) [16] checks the consistency of the corresponding equation system (2).If is considered is the right initial state, then the equation system certainly will be consistent.On the other hand, if is is not the initial state used in the generation of the intercepted bits, then by [16, Theorem 1] the consistency probability of the system will be very small when the intercepted segment is long enough.In order to make the number of false consistency alarms as small as possible, the number of equations in (2) should exceed L + L signi cantly, see [16] and [17].
The attack is divided into two phases.In phase 1, we check the L − initial states is starting by (as only the s of {a i } generate bits in the shrunken sequence) to determine a set Q of possible candidates to initial state of R .In phase 2, for every is in Q its corresponding is will be computed.The pair (is , is ) able to generate all the intercepted shrunken sequence will be the key of the cryptosystem.In brief, the algorithm can be described as follows: INPUT : The lengths L and L of both registers, the characteristic polynomials p (x), p (x) and the N intercepted bits S = {s , s , . . ., s N− } of the shrunken sequence.1. Computation of PHASE 1 2. Computation of PHASE 2 OUTPUT : The initial states is and is (key of the cryptosystem) that generate the shrunken sequence.
In the sequel, the whole attack is described in detail.

PHASE 1:
For each is considered do: 1. Starting in is , generate a portion of sequence {a i } until N ones are obtained.Such ones will be located at positions i k (k = , , . . ., N − ) over {a i }. 2. Determine N positions in the sequence {u i } as

Express each u d k as a function of the rst L terms of {u
It turns out to be a system of linear equations (k = , , . . ., N − ) with N equations in the (u , u , . . ., u L − ) unknowns. 5. Apply the Linear Consistency Test (LCT) [16] to check the consistency of the previous system, if the system is consistent, then include is in Q else is is rejected.

end do
The result of this phase is the set Q of possible candidates to initial state of LFSR R .Once the set Q has been computed, the second step of the attack is performed.

PHASE 2:
For each is in Q do: It turns out to be a system of linear equations Apply the Linear Consistency Test (LCT) to check the consistency of the previous system, if the system is not consistent, then reject (is , is ) else if the pair (is , is ) can generate the shrunken sequence by using the N bits for checking, then cryptosystem broken !!! else is is rejected.

end do
The result of this phase is the pair (is , is ) generating the shrunken sequence, that is the key of the cryptosystem.
A software implementation of the previous attack has been performed on a laptop device with the following speci cations: -Operative system: Arch Linux -CPU: Dual core Intel Core i7-4510U, Cache 4096 KB, Freq.3100 MHz -RAM: 8 GB, Type: DDR3 -Hard Disk: Type SSD, Size 256.1 GB Some numerical results are depicted in Table 1 where L , L are the lengths of registers R and R , respectively, T is the period of the corresponding shrunken sequence, N is the number of intercepted bits for computation, c(Q) is the cardinality of Q, that is the number of candidates to initial state of R , and t is the running time expressed in seconds.It must be noticed that the period of the shrunken sequence is much greater than the number of intercepted bits needed to successfully run the algorithm within a reasonable time.For our computations, N = ⋅ L while N is chosen N = N .In brief, the requirements of intercepted sequence are extremely low.In Table 2, the same results are shown but now the number of intercepted bits N equals L .In this case, since N has been reduced, the execution time has been reduced too.Nevertheless, the number of candidates has grown considerably.Table 3 shows the numerical results corresponding to the veri cation of a unique initial state is in the phase 1 of the algorithm.Recall that even for large values of L and L the execution time of such routine is very low.The most remarkable features of the proposed attack are: 1.The low amount of intercepted bits needed for its execution.Indeed, N = n ⋅ L , n being a small integer (n = , , ), and N ≤ N .Thus the amount of sequence required is linear in the length of the register R .2. The running time of the attack is dominated by phase 1 which has a time complexity of O( L − ⋅ (N × L ) ), that is exponential in L due to the number of is considered and polynomial in L .In fact, the work factor needed for each test is that of the Gauss elimination algorithm applied to the augmented matrix (A, b), which is cubic in the dimension of the matrix.In any case, the cubic factor is irrelevant compared with the exponential factor.3.Both phases and are fully parallelizable and some tweaks can be made to optimize the LCT step.
The program makes use of SageMath, an algebraic computation systems based on Python.In order to handle polynomials over GF( ), SageMath uses the libraries NLT.In order to compute with matrices over GF( ), SageMath uses the libraries M4RI.In the LCT application, the system of equations is transformed into a low reduced echelon form.This step is important in the computation e ciency as the system consistency is reduced to test the existence of a row ( , , . . ., , ) in the coe cient matrix of the system.

Other attacks over the shrinking generator
Other attacks against the shrinking generator have been designed in the literature.For example, in [18], the authors proposed two fault cryptoanalysis.In that work, the attacker is supposed to have a device implementing the shrinking generator and can use it freely.They also assume that the base and control generators of the shrinking generator output bits according to the uniform distribution over GF( ) and that an attacker can disturb clocking of the device, that is, he can stop the control sequence for a couple of steps, and observe the output of the generator.These attacks require injecting speci c faults and restarting the device with partially the same internal state.While injecting such faults is potentially possible, it may require some design faults (so that potentially vulnerable parts of the device were placed on external layers).It shows at least that a careful examination of a chip design might be necessary.Furthermore, on the rst cryptanalysis, there exists a probability of false solution and algorithm failure.As a consequence, they have to assume that the number of 0s between two 1s does not exceed a certain parameter maxzeros.They proved that the probability of a false result grows rapidly with the assumed length of the gap between the 1s.That is why they assume that the control sequence does not contain a block of more than maxzeroes 0s.Of course, when this assumption is false, the algorithm fails.Several correlation attacks against the shrinking generator have been proposed too.A correlation attack was proposed in [19] and was experimentally analyzed in [20], where an exhaustive search through all initial states and all possible feedback polynomials of R was performed.Later, in [21] the author presented a reduced complexity correlation attack based on searching for speci c subsequences of the keystream sequence, whose complexity and required keystream length are both exponential in the length of R .
A few years later, in [22] Golić conducted a probabilistic correlation analysis based on a recursive computation of the posterior probabilities of individual bits of R , which revealed the possibility of implementing certain type of fast correlation attacks on the shrinking generator.A novel distinguishing attack was also proposed in [23].In a subsequent paper [24], the author proposed an improved linear consistency attack based on an exhaustive search through all initial states of R .
In [22], the author conjectured that the shrinking generator could be vulnerable against fast correlation attacks that would not require an exhaustive search through all possible initial states.In [25], the authors tried to answer this question with length of R equal to 61 (as suggested in [26]).They claimed that given 140000 keystream bits, the initial state of R with arbitrary weight characteristic polynomial of degree 61 could be recovered with success probability higher than 99% and complexity , which was a good trade-o between these parameters.
In brief, the algorithm here developed presents two main advantages against other proposals.First, compared with other cryptanalytic attacks, the original key of the cryptosystem is always obtained.As pointed in [16], there is a trade-o between the number of equations to consider and the false positive ratio.Nevertheless, in our experiments we consider a minimum number of equations and in most cases only the original key was retrieved.Furthermore, with the knowledge of the LFSRs' parameters the attacker just needs to intercept a part of the keystream sequence and perform the algorithm; our method does not need further assumptions.Second, the results given in Table 1 show that the required keystream length in our algorithm grows linearly in the length of R , in contrast with other proposals where the amount of required sequence is exponential in the length of any register.

Conclusions
The shrinking generator obtains an implicit non-linearity originated from the decimation process.This process is an attempt to create strong pseudorandom sequences with cryptographically good properties out of weak components.It is proved that the shrunken sequence has a long period, a desirably high linear complexity and good statistical properties.However, the linear properties presented in this work make this generator vulnerable against attacks.This paper presents a cryptanalysis over the shrinking generator based on solving linear systems.Besides, the number of intercepted bits needed to successfully perform the algorithm is substantially lower than the period of the sequence, growing linearly with the length of the register R .

Table 1 .
Numerical results for the algorithm

Table 2 .
Numerical results for the algorithm when N = L

Table 3 .
Numerical results for the veri cation of one is