Weight-2 input sequences of 1 / n convolutional codes from linear systems point of view

: Convolutional codes form an important class of codes that have memory. One natural way to study these codes is by means of input state output representations. In this paper we study the minimum (Hamming) weight among codewords produced by input sequences of weight two. In this paper, we consider rate 1 / n and use the linear system setting called ( A , B , C , D ) input-state-space representations of convolutional codes for our analysis. Previous results on this area were recently derived assuming that the matrix A , in the input-state-output representation, is nonsingular. This work completes this thread of research by treating the nontrivial case in which A is singular. Codewords generated by weight-2 inputs are relevant to determine the e ﬀ ective free distance of Turbo codes.


Introduction
In this work we are interested in investigating codewords of 1/n convolutional codes that are produced by weight-2 information sequences. These codewords play an important role in the computation of the effective free distance in the context of Turbo codes (see [14]) and therefore a better understanding of this particular set of codewords may lead to improvements in the construction of Turbo codes. In this work we focus on the mathematical analysis of these set rather than on possible direct consequences in the performance of Turbo codes. We perform this mathematical investigation within the so-called input state output representations.
Convolutional codes can be modelled by means of input state output representations in the framework of linear time-invariant systems (see [3,5,6,8,16,27,30] for an introduction of the basic theory of this approach). The main advantage of this approach is that the dynamics of the state (memory) of the system (convolutional code) are explicit in this representation. Moreover this enables the application of the huge and powerful machine of systems theory problems in the context of coding theory.
In [14] Divsalar and McEliece studied codewords of convolutional codes that are produced by weight-2 information sequences, derived some theoretical bounds for the effective free distance and posed a conjecture. In this paper, we also make use of a state-space representations but choose representations as introduced in [30] which are slightly different to the driven variable representations used in [14]. These representations led to several important theoretical and practical results of convolutional codes (see [7,8,10,18,21,25,26]) and we continue the study in [19] using the (A, B, C, D) input state output representation of finite-weight convolutional codes. In [19], an upper bound on the effective free distance distance was provided for the particular case in which the matrix A in the input state output representation is an invertible matrix. In this paper we consider the case in which the matrix A is singular. Thus, this work can be considered as an extension of previous results. When the matrix A, that represents the update of the state of the system, is nonsingular, the last input entering into the system must immediately steer the state vector to the zero vector in order to obtain a finite-weight codeword. However, when A is singular, this is not necessarily true, and the state vector might remain nonzero some time after the last input has been introduced into the system. For this reason the extension of the results in [19] to the general case is not straightforward as we show in this work. Nevertheless, we present new characterizations of this set of codewords and provide an upper-bound on the effective free distance. As we show in the this paper, the analysis of these systems (with A singular) turned out to be highly nontrival and so the optimally of the upper-bound could not be formally proven.
The paper is organized as follows: In Section 2 we briefly introduce finite-weight convolutional codes defined over any Galois field and a particular input-state-output representation of such codes. We also recall the relevance of codewords generated by weight-2 inputs and their relation to turbo codes. Section 3 is devoted to provide the main results of the paper. In particular, for a given convolutional code C of dimension one defined over any finite field with an input-state-output representation given by (A, B, C, D) and A a singular matrix, we analyse the dynamics that can occur between the input and the state of the system in this case. We present a conjecture and a novel upper bound on z min (C) based on this conjecture and, in turn, an upper bound on the effective free distance of C. Finally, we present and study a concrete construction of a class of convolutional codes for which we can compute z min (C) up to a difference of one value and provide an example to illustrate the results. We conclude the paper by presenting some conclusion and possible future work within this thread of research.

Basic definitions and properties of Turbo codes and linear systems
In this paper, we denote by F = GF(q) the Galois field of q elements and F[z] the polynomial ring on the variable z with coefficients in F.
Consider the matrices A ∈ F δ×δ , B ∈ F δ×k , C ∈ F (n−k)×δ and D ∈ F (n−k)×k . Following [30] and [28], a rate k/n convolutional code C of complexity δ can be described by the linear system governed by the equations where for each time instant t, x t ∈ F δ is the state vector, u t ∈ F k is the input (also call information vector) and y t ∈ F n−k is the parity vector. In linear systems theory, this representation is known as the inputstate-output representation. This representation was introduced by Rosenthal, York and Schumacher (see [28]) and it has been widely used in the last years to analyze and construct convolutional codes [8,9,29,30]. In terms of Linear Systems, the complexity δ, is the McMillan degree of the linear system (2.2). In the following, we adopt the notation used by McEliece [24] and we call a convolutional code of rate k/n and complexity δ an (n, k, δ)-code.
Note that the description given by expression (2.2) is in general not unique. But if C has complexity δ, then it is possible to choose the matrices A, B, C, and D of sizes δ × δ, δ × k, (n − k) × δ and (n − k) × k, respectively. In convolutional coding theory, an input-state-output representation (A, B, C, D), having the above sizes, is called a minimal representation and it is characterized through the condition that the pair (A, B) is controllable, that is (see [29]), the controllability matrix has full rank, rank The controllablility matrix is a well-known matrix in the area of system theory as it allows to characterized the controlability of the linear system. If (A, B) is a controllable pair, then we call the smallest integer κ having the property that rank Φ κ (A, B) = δ the controllability index of (A, B). On the other hand, we say that (A, C) is an observable pair if (A T , C T ) is a controllable pair (see [29]). If the pair (A, B) is controllable, it means that, by an appropriate choice of input vectors, it is possible to drive a given state vector to any other state vector in finite time. Analogously, the observability of the pair (A, C) means that it is possible to determine the state vector at a given time t 0 by observing the output vectors for a finite number of time steps beginning with t 0 (see, for example, [28,30]).
Following the approach adopted in [29] we only consider { v t } t≥0 in Eq (2.2) to be a finite-weight codeword (see [29] for more details of the algebraic reasons to do so), that is, Equation (2.2) holds for all t = 0, 1, 2, . . . and there is an integer γ such that x γ+1 = 0, u t = 0, for t ≥ γ + 1, and therefore, y t = 0 for t ≥ γ + 1, so the code sequence has finite weight. In this work we denote such a finite-weight codeword by V γ .
Hence, it follows that both the input and the state sequence (and hence the output) must to have finite support in a finite-weight codeword. The set of finite-weight codewords has a module structure over the polynomial ring F[z] (see [29]). By abuse of notation, we will denote this module by C(A, B, C, D) and we refer to it as the finite-weight convolutional code generated by the matrices A, B, C, D. Proposition 2.4 of [29] gives us a characterization of finite-weight codewords. Let us denote by V γ a finite-weight codeword sequence constituted by y 0 u 0 , y 1 u 1 , . . . , y γ u γ ∈ F n represents with y 0 u 0 and y γ u γ 0. Hence, the Eqs of (2.2) are satisfied for all t ≥ 0 and The representation considered here, i.e., (2.2), is indeed the description of the dynamics of a rational and systematic encoder, since by Lemma 2.14 of [29], if C(A, B, C, D) is an (n, k, δ)-code, then, the matrices A, B, C and D describe a proper rational transfer function of C(A, B, C, D), given by Remark 2.2. We note that the state-space realizations considered in this work are different from the driving variable realizations often found in the coding literature [15,23], given by where u t ∈ F k is the information vector, v t ∈ F n the codewords that are, in this case, the outputs of the linear system and x t ∈ F δ as above. Although driving-variable representations have been considered the standard way in which convolutional codes were presented in terms of linear systems, many authors have considered linear systems as described in (2.1) and (2.2) in the last decades as they have many advantages when analyzing convolutional codes [28,29,33]. One of these advantages is that in the driving variable representations, the matrix A has to be nilpotent whereas in the one described in (2.1) and (2.2) the matrix A does not have such a restriction. This fact facilitates the construction of optimal input state output representations of convolutional codes (see [29,32,33]). Another advantage of the setting considered in this paper is that these representations are particularly suitable not only for constructing convolutional codes but also for dealing with finite-weight codewords, see [29,30] for more details. These properties allow us to derive new results regarding lowest weight of the parity vectors of the convolutional code C generated by information sequences of weight two.
Block codes having optimal error correcting capabilities, i.e., with maximum minimum, are quite well-understood, e.g. the class of Reed-Solomon codes [20,34]. However, in order to derive codes with efficient performance, i.e., codes coming closest to the Shannon limit, having large minimum distance it is same times not enough. To achieve optimal performance parallel concatenation of convolutional codes, known as Turbo Codes, were presented by Glavieux and Thitimajshima, see [2]. In a turbo code T C two convolutional codes, C 1 and C 2 of rates k/n 1 and k/n 2 , respectively, are connected via an inter-leaver in such a way that the first encoder, C 1 , operates directly on the input information u t (t = 0, 1, 2, . . .) and the second one, C 2 , encodes the interleaved input information, denoted by P u t (t = 0, 1, 2, . . .), where P is a permutation matrix of order k. Therefore, a codeword of these code in divided in the parity vectors of both encoders followed by the information vector. In [4] the inputstate-output representation for the turbo code T C was introduced from the state representation of the constituent encoders. For more results on these concatenated (convolutional) codes within a linear systems approach the reader is reffered to [8], [9], [11], [15] and [17].
The most important parameter through which the constituent convolutional codes influence the turbo code performance is z min (C) (see [1], [12], [13] and [14]), which it is defined below. Definition 2.1. Let C be a convolutional code. We define z min (C) as the lowest weight of the parity vectors of the convolutional code C generated by information sequences of weight two.
In [1] and [14] it was shown that the performance of turbo codes is primarily driven by the weight-2 input minimum distance, which is directly related to the minimum weight among the set of codeword sequences generated by input sequences of weight two. Hence, if one considers a T C with C 1 = C 2 = C, its weight-2 input minimum distance, which is also referred to as the effective free distance of T C [1], d free,eff (T C), is described as d free,eff (T C) = 2 + 2 z min (C). (2.5)

Upper bounds on the effective free distance of 1/n turbo codes
On a AWGN cannel, code performance is determined largely by the effective free distance. In this section, we get bounds on this distance. Moreover, the design objective for the constituent recursive convolutional encoders of a turbo code is to obtain z min as large as possible. In [1] it was shown that in the binary case there exists a rate 1/n recursive systematic convolutional code C with complexity δ that achieve the maximum value of z min (C), described by Consequently, for a turbo code T C with two equal systematic convolutional codes, they obtain the following upper bound on the effective free distance of T C Recently, in [19] turbo codes were again studied within a linear systems point of view, over finite field. In particular, they consider a turbo code obtained by the concatenation of two identical 1/n recursive systematic convolutional codes C given by its input-state-output representation (A, B, C, D) where the matrix A is invertible. They studied how to obtain the value of z min (C), and derived an upper bound that we present next. First, we need to introduce the following definition, which refers to the minimum time instant at which the last nonzero input is introduced into the system. As at each time instant the input belongs to the field, in the case the rate is 1/n, we use the typography u t rather than u t , to distinguish between scalars and vectors.
Definition 3.1. We defineŝ to be the least s for which there exists a finite-weight codeword V γ of a convolutional code C generated by a vector (u 0 , u 1 , . . . , u s , u s+1 , . . . , u γ ) with weight equal to two and u 0 ,u s 0. Such anŝ is called the minimum effective index of C.
In [19] an upper bound on the value of z min (C) among all convolutional codes with equal set of parameters (n, 1, δ) was introduced, as we show in the following theorem.
The authors of [19] give conditions for an (n, 1, δ)-code to achieve such a bound and they moreover present a concrete construction of an 1/n recursive systematic convolutional code C whose z min (C) is as maximum as possible for these parameters.
Remark 3.1. If we consider that case in which the matrix A is nonsingular, we get z min (C) over the parity vectors of finite-weight codewords V γ generated by input vectors (u 0 , u 1 , . . . , u s ) of weight two with u 0 ,u s 0, with γ = s, since at time instant s the state of the system must go to zero. Thus, the last input u s entering into the system has to yield x s+1 = 0. More concretely, let V γ be a finite-weight codeword generated by an information vector (u 0 , u 1 , . . . , u s , u s+1 , . . . , u γ ) of weight two with u 0 ,u s 0. Then, since A is nonsingular. In other words, if A is nonsingular, it follows that the minimum effective indexŝ of C is obtained by the minimum of the integers s that satisfy the conditions indicated at the beginning of the remark. Moreover, Theorem 1 of [19] indicates that z min (C) is derived only by the weight of the parity vectors of any finite-weight codeword Vŝ of the convolutional code produced by sequences with lengthŝ + 1 ≥ δ + 1 where the two nonzero inputs are the first and the last ones. When the matrix A is singular we may have that holds. This intuitively means that the state of the system (A, B, C, D) does not necessarily vanish at instant s and could remains nonzero for some time after the second (that is, the last) input u s 0 enters into the system. Moreover, let V γ andṼγ be two finite-weight codewords with input vectors (u 0 , u 1 , . . . , u s , u s+1 . . . , u γ ) and (ũ 0 ,ũ 1 , . . . ,ũs,ũs +1 . . . ,ũγ) of weight two with u 0 ,u s , u 0 ,ũs 0 and such thats > s. As opposed to the case in which A is nonsingular, in this case we cannot ensure that wt(y 0 , . . . , y s , y s+1 , . . . , y γ ) ≤ wt(ỹ 0 , . . . ,ỹ s , . . . ,ỹs,ỹs +1 , . . . ,ỹγ), that is, the minimum effective indexŝ given in Definition 3.1 is not directly related to z min (C) is in the case where the matrix A is singular. Therefore, the ideas used to show Theorem 3.1 for A nonsingular cannot be straightforward applied in this case and we need to use a different approach.
3.1. z min of a rate 1/n recursive systematic convolutional code C(A, B, C, D) with A singular.
Next, we investigate the set of finite-weight codewords generated by input vectors of weight two which give us the value of z min when an (n, 1, δ)-code C is given by an input-state-output representation (A, B, C, D) such that the matrix A that updates the state vector of the system is singular. As noted in Remark 3.1 the length γ + 1 of the finite-weight codeword V γ can be much larger than the minimum time instant of the last nonzero input, denoted by s. Note that in these γ − s instants, the corresponding input is zero but the state is nonzero and continues to generates outputs vectors y i = Cx i , i = s + 1, . . . , γ. This makes it difficult to obtain an upper bound on z min in terms of the minimum effective index. Nevertheless, we can delimit the inputs that will generate the finite-weight codewords where z min will be reached, as we will see at the end of this section.
Now suppose that C(A, B, C, D) is a rate 1/n convolutional code with complexity δ. Then, the matrices (A, B) form a controllable pair, so where κ is the so-called controllability index of (A, B). Also, in the case that C(A, B, C, D) is an (n, 1, δ)code with (A, B) controllable, it follows that the controllability index κ is equal to the complexity δ, κ = δ. Now, let V γ be a finite-weight codeword with u 0 0. Then, relations (2.3) and (3.2), imply necessarily γ > κ − 1 and therefore, we get the following result. Among all the parity vectors of finite-weight codewords generated by input vectors of weight two we can restrict ourselves to a smaller set in order to compute z min (C), as stated in the following lemma.
Assume now that C(A, B, C, D) is a rate 1/n convolutional code with A singular. It is well-known that if (A, B) is a controllable pair, we can assume without loss of generality (see Remark 2.1) that (the so-called controllable canonical realization [22]). If A is a singular matrix, then there exists an integer τ ≥ 1 such that p δ− j = 0 for j = 1, 2, . . . , τ and p δ−τ−1 0. In the remain of this section, we work with the controllable canonical form of (A, B) with A singular. Let V γ be a finite-weight codeword generated by an information vector (u 0 , u 1 , . . . , u s , u s+1 , . . . , u γ ) of weight two, with only u 0 , u s 0. From (2.3) we have that So we focus our attention in the kernel of the matrix A η , for η ≥ 1.
So we have proof by the induction method that the η-th power of A is given by the statement of the lemma.

Remark 3.2.
Observe that if η ≥ τ, the dimension, and in particular, a basis of the subspace ker(A η ), is independent of η.
Proof. Let V γ be a finite-weight codeword of C generated by an information vector (u 0 , u 1 , . . . , u s , . . . , u γ ) of weight two with u 0 ,u s 0. Then, so A s Bu 0 + Bu s ∈ ker A γ−s . It follows from Remark 3.2 that ker A γ−s is the same subspace for any γ and s, provided τ ≤ γ − s ≤ δ − 1. As we consider the case in which s + τ ≤ γ ≤ s + δ − 1 it follows from statement b) of Lemma 3.4 that dim(ker A γ−s ) = τ and a basis for ker A γ−s ⊆ F δ×1 is given by the column vectors: where e i denotes the i-th vector of the canonical basis of F δ×1 , for i = 1, 2, . . . , τ. Therefore, one has that A s Bu 0 + Buŝ must be of the form (d 1 , d 2 , . . . , d τ , 0, . . . , 0) T or, equivalently (observe the structure of matrix B given by (3.3)), where d δ 0 as we require u s 0. Hence, s is in fact the smallest integer such that the last column of A s is a vector like ( any τ elements * , * , . . . , * , 0, . . . , 0, a nonzero element * ) T .
Remark 3.3 provides the structure of the elements of the last column of A s , that it can be seen as a feedback polynomial of a Linear Feedback Shift Register (LFSR). The maximum cycle of a LFSR of length δ is q δ if the associated polynomial is primitive. The number of states of the form (3.6) is q dim(ker A γ−s ) = q τ times the (q − 1) possible nonzero elements for the last row of A s B. This leads to the following upper-bound on s: which concludes the proof.
In the following result we study the set of codewords with length γ + 1 such that γ < s + τ. Then, the lowest weight of the parity vectors of V γ with γ < s + τ, is achieved for s ≤ q δ − (q (γ−s) (q − 1)).
Proof. The proof follows the same idea used in the proof of Theorem 3.2. Note that by Lemma 3.1 we have that γ ≥ δ. Also it holds from Lemma 3.4 that dim(ker A γ−s ) = γ − s < τ. Hence, for each value of γ and s such that γ − s < τ we have that s is the smallest integer such that the last column of A s is a vector of the form 4. Optimal upper-bound on z min for a class of (n, 1, δ) recursive systematic convolutional codes In this section we present a concrete construction of a class of convolutional codes with δ ≥ 2 for which we can compute the minimum effective index and the exact value of z min up to a difference of one value. Furthermore, we can show that such upper-bound is optimal and we do that by presenting a particular example that reaches the provided upper-bound. To this end we need a class of matrices that have been very useful for the construction of convolutional codes with large Hamming distance, namely, the so-called superregular matrices.  [31]). Let A be an n × matrix over a finite field F. We say that A is a superregular matrix if every square submatrix of A is nonsingular.
The following Lemma is an immediate consequence of Definition 4.1 and it gives a lower bound on the weight of a linear combination of columns of a superregular matrix.
Lemma 4.1 (Lemma 3 of [10]). Let A be a superregular matrix over a finite field F of size n × , with n ≥ . It follows that any nontrivial linear combination of m different columns of A cannot have more than m − 1 entries equal to zero.
In the following result we present a particular construction based on a input-state-output representation where the pair (A, B) is in canonical controllable form with A singular, C a superregular matrix and D a column of C. We establish that the lowest weight of the parity vectors of V γ is achieved in fact by the ones generated by weight-2 input sequences (u 0 , u 1 , . . . , u γ ) with u 0 0 and u 1 0. Furthermore, we establish a lower and an upper bound of z min (C) for these case.
Theorem 4.1. Let δ and n be any positive integers with δ ≥ 2, n ≥ δ + 1 and q ≥ n + δ. Let C(A, B, C, D) be an (n, 1, δ)-code described by the matrices of sizes δ × δ, δ × 1, (n − 1) × δ and (n − 1) × 1 respectively and where C is a superregular matrix. Then we have that Moreover, the minimum effective indexŝ achieves its minimum possible value, i.e.,ŝ = 1 and so the value of z min (C) is reached in finite-weight codewords of minimum length γ = δ and it is calculate as Proof. Taking into account the structure of the matrices A and B of (4.1) and Lemma 3.3, the finiteweight codeword of minimal length generated by input of weight two is V δ . Furthermore, in this case we have that u 1 = −u 0 and u 2 = u 3 = · · · = u δ = 0. Then, the parity check vectors of V δ are of the form: where for j = 2, 3, . . . , δ. Furthermore, n − 2 ≤ wt( y 1 ) ≤ n − 1 since C is a superregular matrix and by Lemma 4.1 any nontrivial linear combination of two different columns of C cannot have more than 1 entry equal to zero. Similarly, we can ensure that wt( y j ) = n − 1, since C is superregular. So we obtain the following bounds on the weigth of the parity vectors y j of V δ .
Our aim now is to proof that in fact, z min (C) is obtained from the minimum of the parity vectors of all finite-weight codewords of lenght δ + 1. In order to do this, consider now a finite-weight codeword V γ generated by a input vector (ū 0 ,ū 1 , . . . ,ū γ ) of weight two with γ > δ. Then, there exists a time instant r ≥ 1 such thatū 0 0,ū r 0 andū j = 0 for j 0, r. From Lemma 3.2, we know that if r = 1, then we can ensure that the parity vector ( ȳ 0 , ȳ 1 , . . . , ȳ γ ) of V γ have weight greater or equal to the parity vector ( y 0 , y 1 , . . . , y δ ) of any finite-weight codeword V δ of lenght δ + 1. Furthermore, if r > 1, then from the structure of matrices A, B, C, D, Lemma 3.3 and taking into account that C is superregular, then we obtain the following bounds on the weight of the parity vector ( ȳ 0 , ȳ 1 , . . . , ȳ γ ) (4.6) Taking into account relations (4.5) and (4.6) and the fact that δ < n, we obtain where ( y 0 , y 1 , . . . , y δ ) and ( ȳ 0 , ȳ 1 , . . . , ȳ γ ) are the parity vectors of any codewords V δ and Vγ with γ > δ, respectively. So we can conclude that z min (C) is obtained by the minimum of the weight of the parity vectors of all the finite-weight codewords V δ generated by input vectors with length δ + 1. In the following example, we show a convolutional code whose z min (C) reaches the upper bound of the relation (4.2).
Example 4.1. Let F be the Galois field of 7 elements and let C(A, B, C, D) be an (4, 1, 2)-code described by the matrices It is easy to see that the the (4, 1, 2)-code described by the above matrices A, B, C and D satisfy the hypothesis of Theorem 4.2. Then we know that the That is in this case the code attains the maximal value.
In the example before the superregular matrix C is a Cauchy matrix and it is a small example. If we need to construct a turbo code with a determinate z min (C) we must to consider a bigger parameters and consequently a bigger field. Work with a big finite field increases computational costs. In order to minimize the size of the field we introduce a similar construction for a singular A similar to Theorem 4.1 in which we make use of the so-called extended Cauchy matrices (see [31]).
Theorem 4.2. Let F be the Galois field of q elements. Let δ and n be any positive integers with δ ≥ 2, n ≥ δ + 1 and q ≥ n + δ − 1. Let C(A, B, C, D) be an (n, 1, δ)-code described by the matrices of sizes δ × δ, δ × 1, (n − 1) × δ and (n − 1) × 1 respectively and where C is a extended Cauchy matrix with the first column c 1 = (c 11 , c 21 , ldots, c (n−1),1 ) of ones. Then we have that (n − 1)(δ + 1) − 1 ≤ z min (C) ≤ (n − 1)(δ + 1) Moreover, the value of z min (C) is reached in a finite code word of minimum length γ = δ and it is calculate as It is easy to see that the the (4, 1, 2)-code described by the above matrices A, B, C and D satisfy the hypothesis of Theorem 4.2. Then we know that the z min (C) = wt(D) + wt(CB − D) + wt(CAB − CB) = wt( That is in this case also the code attains the maximal value.

Conclusions and future work
In this work we study the lowest Hamming weight of the parity vectors generated by information sequences of weight two, that is, z min , of a 1/n convolutional code C(A, B, C, D) represented in terms of the input-state-output representation. We analyze how one can reduce the computations to derive this value which is, in general, difficult to compute as it is the minimum over the large set of codeword with inputs of weight two. In this work we reduce this set by studying the structure of the codewords produced by the input-state-output system. This will lead to reduce the compute search to obtain the exact value of z min (C). We also presented a class of convolutional codes for which we know the form of the codewords that lead to the computation of z min and therefore allow us to determine its exact value up to a difference of one unit.
It is left as an open problem to provide a specific lower and upper bounds on z min (C), and consequently, lower and upper bounds on the effective free distance over general finite fields. Also it would be interesting to show that this hypothetical upper bound it tight by presenting a concrete construction of a Turbo Code whose effective free distance reaches this bound. Also interesting it would be to derive different constructions to the one given in Section 4 having better bounds.