Skew Convolutional Codes

A new class of convolutional codes, called skew convolutional codes, that extends the class of classical fixed convolutional codes, is proposed. Skew convolutional codes can be represented as periodic time-varying convolutional codes but have a description as compact as fixed convolutional codes. Designs of generator and parity check matrices, encoders, and code trellises for skew convolutional codes and their duals are shown. For memoryless channels, one can apply Viterbi or BCJR decoding algorithms, or a dualized BCJR algorithm, to decode skew convolutional codes.


Introduction
Convolutional codes were introduced by Elias in 1955 [1]. With the discovery that convolutional codes can be decoded with Fano sequential decoding [2], Massey threshold decoding [3], and, above all, Viterbi decoding [4], they became quite widespread in practice. Convolutional codes are still widely used in telecommunications, e.g., in Turbo codes [5] and in the WiFi IEEE 802.11 standard [6], in cryptography [7], etc.
The most common are binary convolutional codes; however, communication with higher orders of modulation [8] or streaming of data [9] require non-binary convolutional codes. It is known that periodic time-varying convolutional codes improve the free distance and weight distribution over fixed convolutional codes; see, e.g., Mooser [10] and Lee [11]. This is the motivation to introduce a new class of periodic time-varying non-binary convolutional codes, i.e., skew convolutional codes. These codes are based on the non-commutative ring of skew polynomials over finite fields and on the skew field of their fractions.
Convolutional codes are nonblock linear codes over a finite field, but it can be advantageous to treat them as block codes over certain infinite fields. We will use both approaches. Classical convolutional codes are described by usual polynomials. The product of the polynomials corresponds to convolution of vectors of their coefficients and this gives fixed-in-time classical convolutional codes. We replace usual polynomials by skew polynomials to define the new codes. The product of skew polynomials corresponds to skew convolution of their coefficients, which can be obtained by varying elements in the usual convolution. In this way, we obtain time varying convolutional codes.
Our goal is to define and to give a first encounter with skew convolutional codes. In Section 2, we define skew convolutional codes. In Section 3, we obtain generator matrices and encoders for skew codes and show that skew codes are equivalent to time-varying convolutional codes. Some useful properties of skew codes are considered in Section 4. Section 5 introduces dual skew convolutional codes. Trellis decoding of skew codes is considered in Section 6. Section 7 concludes the paper.

Skew Polynomials and Fractions
Consider a field F and an automorphism θ of the field. Later on, we will use the finite field F = F q m , where q is a prime power, with the Frobenius automorphism θ(a) = a q , ∀a ∈ F. (1) The composition of automorphisms is denoted as θ(θ(a)) = θ 2 (a), and, for any integer i, we have θ i (a) = θ(θ i−1 (a)). The identity automorphism θ(a) = a is denoted as θ = id. For the automorphism (1) for all a ∈ F, we have θ i (a) = a q i and θ m = id since a q m = a.
Denote by R = F[D; θ] the non-commutative ring [18] of skew polynomials in the variable D over F (with zero derivation) such that R = F[D; θ] = {a(D) = a 0 + a 1 D + · · · + a n D n | a i ∈ F and n ∈ N}.
The skew polynomials look like usual polynomials F[D] where the coefficients a i are placed to the right of the variable D. The addition in R is as for usual polynomials from F [D]. The multiplication is defined by the basic rule and is extended to all elements of R by associativity and distributivity; see Example 1 below. The ring R has a unique left skew field of fractions Q, from which it inherits its linear algebra properties, see, e.g., [19] for more details.
Example 1. To demonstrate our results, we use the field F Q = F q m = F 2 2 , q = 2, m = 2, with automorphism θ(a) = a q = a 2 for all a ∈ F 2 2 . The field F 2 2 consists of the elements {0, 1, α, α 2 }, where a primitive element α satisfies α 2 + α + 1 = 0 and the following relations hold: Let a(D) = 1 + αD and b(D) = α 2 + D, a(D), b(D) ∈ R. Using (2), we compute the product ab as In this example, we see that a(D)b(D) = b(D)a(D). The left skew field Q consists of fractions b(D) a(D) = a −1 (D)b(D) ∈ Q for all a(D), b(D) ∈ R, a(D) = 0. Every fraction can be expanded as the left skew Laurent series in increasing powers of D. In our example, the inverse element 1 a(D) = a(D) −1 is expanded using long devision as follows: Notice that the right fraction b(D)a −1 (D) = α 2 , since b(D) = α 2 + D = α 2 (1 + αD) = α 2 a(D).

Definition of Skew Convolutional Codes
Much of linear algebra can be generalized from vector spaces over a field to (either left or right) modules over the skew field Q. Indeed, it is shown in [19] that any left Q-module C is free, i.e., it has a basis, and any two bases of C have the same cardinality, the dimension of C.
Definition 1 (Skew convolutional code). Given an automorphism θ of the field F, a skew convolutional [n, k] code C over the field F is a left sub-module of dimension k of the free module Q n .
The elements of the code C are called its codewords.
is an n-tuple over Q, where every component v (i) (D) is a fraction of skew polynomials from R. The code C is F = F q m -linear. Let the weight of every codeword be defined by some selected metric.
The free distance d f of a skew convolutional code is defined to be the minimum nonzero weight of any codeword.
For the Hamming metric, which is the most interesting for applications, the weight of a fraction v (i) (D) is the number of nonzero coefficients in its expansion as a left skew Laurent series from F((D)) in increasing powers of D. The weight of a codeword is the sum of weights of its components. Another interesting metric is sum-rank metric, which will be defined later.

Relations to Fixed Convolutional Codes
Lemma 1. The class of skew convolutional codes includes the class of fixed (time-invariant) convolutional codes.

Proof.
A time-invariant convolutional [n, k] code C over the field F is defined as a k-dimensional subspace of F(D) n . Take the identity automorphism θ = id. Then, the ring R = F[D; θ] becomes a ring F[D] of usual commutative polynomials. The skew field of fractions Q becomes the field of rational functions F(D). In this case, by Definition 1, the skew convolutional code C coincides with the classical fixed code C.

Polynomial Form of Encoding
A generator matrix of a skew convolutional [n, k] code C is a k × n matrix G(D) over the skew field Q whose rows form a basis for the code C. If the matrix G(D) is over the ring R of skew polynomials, then G(D) is called a polynomial generator matrix for C. Every skew code C has a polynomial generator matrix. Indeed, given a generator matrix G(D) over the skew field of fractions Q, a polynomial generator matrix can be obtained by left multiplying each row of G(D) by the least common left multiple of the denominators in that row. In the paper, we focus on polynomial generator matrices and corresponding encoders.
By Definition 1, every codeword v(D) of a skew code C, which is an n-tuple over the skew field can be written as v(D) = u(D)G(D) (4) where u(D) is a k-tuple (k-word) over Q: and is called an information word, G(D) is a generator matrix of C, G(D) ∈ Q k×n or G(D) ∈ R k×n . Relation (4) already provides an encoder. This encoder (4) is just an encoder of a block code over Q and the skew code C can be considered as the set of n-tuples v(D) over Q that satisfy (4), i.e., C = {v(D)}. However, to use the code in practice we need to write components of u(D) and v(D) as skew Laurent series, i.e., we have and Actually, in a Laurent series, the lower (time) index of coefficients can be a negative integer, but, in practice, the information sequence u (i) (D) should be causal for every component i, that is, the coefficients u (i) t are zeros for time t < 0. Causal information sequences should be encoded into causal code sequences; otherwise, an encoder can not be implemented, since it should output code symbols before it receives an information symbol.
Denote the block of information symbols that enters an encoder at time t = 0, 1, . . . by The block of code symbols that leaves the encoder at time t = 0, 1, . . . is denoted by Combining (5), (6), and (8), we obtain the following information series with vector coefficients: Using (3), (7), and (9), we write a codeword as a series One can write a skew polynomial generator matrix G(D) = g ij (D) ∈ R k×n as a skew polynomial with matrix coefficients: where µ is the maximum degree of polynomials g ij (D). The matrices G i are k × n matrices over the field F and µ is called the generator matrix memory. From (4), (10), and (11), we obtain that v t is a coefficient in the product of skew series u(D) and skew polynomial G(D), which is the following skew convolution (see Figure 1) where u t = 0 for t < 0. This encoding rule explains the title skew convolutional code, which can be also seen as the set C = {v(D)} of series v(D) defined in (11). At time t, the decoder observes an information block u t of k symbols from F and outputs the code block v t of n code symbols from F using (13); hence, the code rate is R = k/n. The encoder (13) uses u t and also µ previous information blocks u t−1 , u t−2 , . . . , u t−µ , which should be stored in the encoder's memory; this is why µ is also the encoder memory.
The coefficients θ t−i (G i ), i = 0, 1, . . . , µ in the encoder (13) depend on the time t. Hence, the skew convolutional code is a time-varying classical convolutional code. Denote For the field F = F q m , we have θ m = θ 0 ; hence, the coefficients in (13) are periodic with period τ ≤ m, and the skew convolutional code is periodic with period τ ≤ m. The period τ can be less than m if all the matrices G 0 , . . . G µ are over a subfield of F q m .

Scalar Form of Encoding
The input of the encoder can also be written as an information sequence of k-blocks over F u = u 0 , u 1 , u 2 , . . . , u t , . . . (15) and the output as a code sequence of n-blocks over F Then, the encoding rule (13) can be written in a scalar form v = uG (17) with semi-infinite scalar generator block matrix Thus, a skew convolutional code can be equivalently represented in a scalar form as the set C = {v} of sequences v defined in (16) that satisfy (17).

Relations between Skew and Classical Convolutional Codes
In case of identity automorphism, θ(a) = a, the scalar generator matrix of the skew code becomes which is a generator matrix of a fixed convolutional code [20]. For fixed convolutional codes, polynomial generator matrices with G 0 of full rank k are of particular interest [20] (Chapter 3). The skew convolutional codes use the following nice property: if G 0 has full rank, then θ i (G 0 ) has full rank as well for all i. Time-varying classical convolutional codes are defined by the following generator matrices in [20], where the first index t in G t,i is time index. The code defined by the generator matrix G var in (20) is

Lemma 2.
A scalar generator matrix (18) of a skew code can be written in the following equivalent form: Proof. The statement follows from the change of variables From (14), (20) and (21), we see again that a skew code defined by a generator matrix (21) is a τ-periodic classical convolutional code. Thus, above we proved the following theorem.
Theorem 1. Given a field F = F q m with automorphism θ in (1), any skew convolutional [n, k] code C over F is equivalent to a periodic time-varying (classical) convolutional [n, k] code over F, with period τ ≤ m (14). If G(D) is a skew polynomial generator matrix (12) of the code C, then the scalar generator matrix G of the time-varying code is given by (18) or (21).
Not every periodic classical convolutional code can be represented as a skew code. Indeed, e.g., the submatrix G 1,0 in (20) can be selected independently of G 0,0 while corresponding submatrix (21) is completely determined by G 0 . Hence, a class of skew convolutional codes is a subclass of periodic classical convolutional codes.
Given the field F q m , the automorphism θ in (1), and the code memory µ, an [n, k] skew convolutional code is defined by a generator matrix G in (21). To specify the matrix, we should fix elements of µ + 1 matrices G 0 , . . . , G µ of size k × n. Hence, we should define (µ + 1)kn field elements. Since a classical convolutional code corresponds to the identity automorphism θ = id, the description of skew and classical codes require the same number of field elements.
The number of skew [n, k] convolutional codes over F Q = F q m of memory µ with fixed automorphism θ(a) = a q has order q (µ+1)mkn . The number of τ-periodic classical convolutional codes has order q (µ+1)mknτ , which is much larger. As a result, the search of good periodic time-varying convolutional codes is much simpler in the class of skew codes in comparison with periodic classical codes. The search among skew convolutional codes has the same complexity as the search among fixed classical codes.
How many more skew codes can we obtain by considering all possible automorphisms? Denote q = p s , where p is the field characteristic then F q m = F p sm = F p M , i.e., our field F Q is an M-extension of the prime field F p . The parameter q = p s , we should select such that F q is a subfield of In Example 2, we have F Q = F 2 2 , i.e., p = q = 2, s = 1, m = 2. M = sm = 2 with divisors 1 and 2. For i = 1, we have q = p i = 2 and θ(a) = a 2 considered in Example 2. For i = 2, we have q = p 2 = 4 that corresponds to θ = id and gives a constant classical convolutional code. For the field F 6 2 , we have δ(6) = 4, and there are four sub-classes of skew codes with q = 2 1 , 2 2 , 2 3 , and q = 2 6 (for fixed code).

Extension of Fixed Convolutional Codes
To show properties of skew convolutional codes, we will use the following example.
Example 2. Consider a [2,1] skew convolutional code C over the field F Q = F q m = F 2 2 with automorphism θ(a) = a q = a 2 (see Example 1). Let the generator matrix of the code C in polynomial form be The generator matrix in scalar form (18) is Here, µ = 1; hence, it is a unit memory code. The encoding rule is v = uG, or, from (13) In some applications, it is preferable to have a generator matrix in systematic form. For our example, a systematic fractional matrix can be obtained using the left division of its components by the first component Let us show that the matrices G syst (D) and G(D) in (22)  Proof. By Lemma 1, the class of fixed codes is included in the class of skew convolutional codes. Hence, it is sufficient to show that there exists a codeword in a skew convolutional [n, k] code that can not belong to any fixed [n, k] code with the same memory. Indeed, consider the unit memory, µ = 1, skew [2, 1] code C defined by the generator matrix (24). By encoding the information sequence u = 1, 0, 0, 1, we obtain the codeword Suppose for the sake of contradiction that the codeword v belongs to a fixed unit memory [2, 1] convolutional code C . A general form of generator matrix of a unit memory fixed [2, 1] code C is (19): In case i), v 0 = e(a, b) = (1, α) and v 3 = h(a, b) = (1, α 2 ), e −1 (1, α) = h −1 (1, α 2 ), which is impossible since the vectors (1, α) and (1, α 2 ) are linearly independent. In case ii), linear combinations of linearly dependent vectors (c, d) and (a, b) should give two linearly independent vectors v 0 = (1, α) and v 3 = (1, α 2 ), which is impossible as well.

Canonical Encoders and Generator Matrices
The encoder in a controller canonical form [20] for the code C in Example 2 with generator matrix (22) is shown in Figure 2a for even t and in Figure 2b for odd t. The encoder has one shift register, since k = 1. There is one Q-ary memory element in the shift register shown as a rectangle, where Q = q m = 4 is the order of the field. We need only one memory element since a maximum degree of items in G(D), which consists of a single row in our example, is 1. A large circle means multiplication by the coefficient shown inside. Figure 2. Encoder of the skew code C from Example 2: (a) for even t, (b) for odd t.
In the general case of a k × n matrix G(D), we define the degree ν i of its ith row as the maximum degree of its components, and external degree ν of G(D) is the sum of its row degrees [21]. The controller-canonical-form encoder of G(D) over F Q has k shift registers, the ith register has ν i memory elements, and the total number of Q-ary memory elements in the encoder is ν. Different generator matrices for a code C may have different external degrees.

Definition 2.
Among all skew polynomial generator matrices (PGM) for a given skew convolutional code C, those for which the external degree is as small as possible are called canonical PGMs. This minimal external degree is called the degree or overall constraint length of the code C, and denoted as ν = deg C.

Code Trellises
For the code C in Example 2, the code trellis is shown in Figure 3. The trellis consists of sections periodically repeated with period τ = m = 2. Every section has Q ν = 4 1 = 4 states labeled by elements of the field F Q . For the t-th section for time t, t = 0, 1, . . . , an edge connects the states u t−1 and u t and is labeled by the code block v t . Every codeword is represented by a path in the trellis that starts from the zero state 0 and goes to the right. The edge label v t is computed according to the encoding rule (25) as follows: where we assume that u −1 = 0, i.e., the initial state of the shift register is 0.

Code Distances
There are two important characteristics of a convolutional code: the free distance d f and the slope σ of the increase of the active burst distance, defined as follows [20]. The weight of a branch labeled by a vector v t is defined to be the weight w(v t ) of v t . The weight of a path is the sum of its branch weights. A path in the trellis that diverges from zero state, which does not use edges of weight 0 from zero state to zero state, and that returns to zero state after edges is called a loop of length or -loop. The th order active burst distance d burst is defined [20] to be the minimum weight of -loops in the code trellis. The slope is defined as σ = lim →∞ d burst / . The free distance is d f = min d burst .
Theorem 3 (Singleton bound). The free Hamming distance of [n, k] skew convolutional code C of degree ν = deg C is upper bounded as follows: Proof. We adopted the proof given in [22] for time-invariant finite state codes. The trellis of the code C is time-varying with Q ν states at every level. Consider Q k information sequences u 0 , . . . , u −1 of length blocks. For each of them, the code path in the trellis starts at the state 0 and terminates in one of Q ν states. From the pigeon-hole principle, it follows that there must be at least Q k−ν = Q K of these paths that have the same final state. The code sequences corresponding to these paths can be thought of as a block code with length N = n with at least Q K codewords. We should select such that K = k − ν > 0. The Hamming distance d of the block code is upper bounded by the Singleton bound d ≤ N − K + 1 = (n − k) + ν + 1. On the other hand, d is an upper bound on the free Hamming distance d f of the code C. Since this is true for all > ν/k, we have that gives the upper bound (28).
To obtain (28), we use the Singleton bound for block codes; therefore, the bound (28) is Singleton-type bound for skew convolutional codes. In fact, this bound and the proof are valid for arbitrary (also for nonlinear) time-varying trellis codes. In [23], codes that reach the Singleton-type bound are called maximum distance separable (MDS) codes. Any other upper bound for the Hamming distance of block codes can be used to obtain another upper bound for d f of skew convolutional codes (also for time-varying trellis codes). Using the Plotkin bound for block codes, we obtain the following bound.
Corollary 1 (Heller bound). The free Hamming distance of [n, k] skew convolutional code C over F Q of degree ν = deg C and memory µ is upper bounded as follows: The bound is named Heller since it was obtained for fixed binary convolutional codes in 1968, see [20,24]. The bound (29) is valid for for time-varying (nonlinear or linear) trellis codes.
In the Hamming metric, the upper bound for the slope σ was obtained in [25] for fixed binary convolutional codes. We conjecture that this bound is true also for non-binary time-varying convolutional codes, and, hence, for skew convolutional codes. Another interesting metric for convolutional codes is the sum-rank metric, which can be applied for multi-shot network coding [26]. The metric is defined as follows. The rank weight w R (v t ) of a vector v t over the extension field F q m is the rank of the vector over the base field F q , i.e., w R (v t ) is the maximum number of F q -linearly independent components of v t . The sum-rank weight of a sequence v in (16) is the sum of weights of its items v t . The sum-rank distance between two sequences is the weight of their difference.
The rank of a vector v t ∈ F n q m is upper bounded by the Hamming weight w H (v t ) of the vector, i.e., Hence, any upper bound for the Hamming metric is an upper bound for the sum-rank metric, and, from Theorem 3, we have the following corollary.

Corollary 2.
The free sum-rank distance d f of [n, k] skew convolutional code C of degree ν = deg C is upper bounded by (28) or by (29).
On the other hand, from (31), we obtain the following lemma.

Lemma 4.
Let the distance of a code C in the sum-rank metric be d; then, in the Hamming metric, the code distance is at least d.
We next compute the free distance, active burst distances, and the slope for the code C from Example 2 for the Hamming and for the sum-rank metrics.

Lemma 5.
In the sum-rank metric, the skew convolutional code C defined by G(D) in (22) has the -th active burst distance d burst = + 2 for = 2, 3, . . . , the slope of the active distance is σ = 1, and the free distance is d f = 4.
Proof. For this code, the shortest length of a loop is = 2; hence, we should consider loops of length = 2, 3, . . . . It follows from (27) that the weight w R (v t ) = rank v t of a branch in the code trellis that diverges from or merges to zero state is 2. A branch connecting two nonzero states has weight at least 1. Indeed, for odd t, the branch label v t in (27) is a linear combination of vectors (α, α 2 ) and (1, α 2 ) that are F q m -linearly independent, and v t can't be 0 for nonzero coefficients u t−1 , u t . The same is true for even t. Hence, d burst ≥ + 2. On the other hand, the path, corresponding to the information sequence u 0 = u 1 = · · · = u −1 = 0, u = 0, is an -loop of weight + 2. Hence, d burst ≤ + 2, and the statement of the lemma follows.
Combining Lemmas 3 and 4, we obtain the following corollary.

Corollary 3.
In the Hamming metric, the skew convolutional code C defined by G(D) in (22) has the -th active burst distance d burst = + 2 for = 2, 3, . . . , the slope of the active distance is σ = 1 and free distance is For both metrics, the Hamming and the sum-rank, the upper bounds for the free distance (28) and for the slope (30) for the unit memory [2, 1] code C become d f ≤ 2n − k + 1 = 4, and σ ≤ n − k = 1.
Hence, the skew code C defined by (22) achieves the Singleton-type upper bound on d f and can be called MDS codes like in [23]. The Heller bound (29) gives d f ≤ 4 as well. The slope of C also reaches the upper bound (30).
A generator matrix G(D) of a skew convolutional code (and corresponding encoder) is called catastrophic if there exists an information sequence u(D) of infinite weight such that the code sequence v(D) = u(D)G(D) has finite weight. The generator matrix G(D) in (22) of skew convolutional code C with θ = (.) q is non-catastrophic, since the slope σ > 0. Note that, in case of fixed convolutional code C , i.e., for θ = id, the generator matrix (22) is catastrophic, and the code has d f = 2 and σ = 0.

Blocking of Skew Convolutional Codes
A skew convolutional code C, represented as a τ-periodic [n, k] code, can be considered as a [τn, τk] fixed code C (τ) by τ-blocking, described in [21]. The only difference between C and C (τ) is that the code symbols are grouped in blocks v t of different lengths in these codes. In this way, known methods to analyze fixed codes can be applied to skew convolutional codes.
For example, the [2, 1] skew code C with generator matrix (24) has period τ = m = 2 and can be written as [4,2] fixed code C (τ) = C (2) defined by the scalar generator matrix which coincides with the matrix G in (24) but is written in 2-blocked form. From G (2) , we obtain the generator polynomial matrix of the [4,2] blocked code C (2) as In general, for any skew convolutional code C and for any i-blocking C (i) , i ∈ N 1 , the codewords, represented by sequences v of elements from F q m in (16), are the same for codes C and C (i) . Hence, the codes have the same properties, e.g., we have

Definitions of Duality
The duality of skew convolutional codes can be defined in different ways. First, consider a skew convolutional code C over F in a scalar form as a set of sequences as in (16). For two sequences v and v , where at least one of them is finite, define the scalar product (v, v ) as the sum of products of corresponding components, where missing components are assumed to be zeros. We say that the sequences are orthogonal if (v, v ) = 0.

Definition 3.
The dual code C ⊥ to a skew convolutional [n, k] code C is an [n, n − k] skew convolutional code C ⊥ such that (v, v ⊥ ) = 0 for all finite length words v ∈ C and for all words v ⊥ ∈ C ⊥ .
Another way to define orthogonality is, for example, as follows. Consider two n-words v(D) and v ⊥ (D) over Q n . We say that v ⊥ (D) is left-orthogonal to v(D) if v ⊥ (D)v(D) = 0 and right-orthogonal if v(D)v ⊥ (D) = 0. A left dual code to a skew convolutional code C can be defined as The dual code C ⊥ left is a left submodule of Q n , hence it is a skew convolutional code. Later on, we consider dual codes according to Definition 3 only, since it is more interesting for practical applications.

Parity Check Matrices
Given a code C with generator matrix G, we next show how to find a parity check matrix H, such that GH T = 0.
Let a skew [n, k] code C of memory µ be defined by a polynomial generator matrix G(D) in (12), which corresponds to the scalar generator matrix G in (18). For the dual [n, n − k] code C ⊥ , we write a transposed parity check matrix H T of memory µ ⊥ , similar to classical convolutional codes, as where rank(H 0 ) = n − k. Similar to [20], we call the matrix H ⊥ the syndrome former and write it in polynomial form as Then, we have the following parity check matrix of the causal code C with the generator matrix (21) which, in the case of θ = id, coincides with the check matrix of a classical fixed convolutional code. From Definition 3, we have that vH T = 0 for all sequences v ∈ C over F. On the other hand, from (4)  Proof. We show the proof for the code of memory µ = 1 (like in Example 2). For the general memory case, the proof follows similarly. Consider the code with generator matrices G(D) and G given by (12) and (18). Let us find a check matrix with memory µ ⊥ = µ = 1. Then, we have and From the condition G(D)H T (D) = 0, we have the following system of equations for unknowns From the condition GH T = 0, we obtain the following equations: by multiplying the first row of G by H T we get the system (38), by multiplying the second row of of G by H T we get the system which is equivalent to (38). Multiplication of other rows of G by H T does not give new equations. Hence, conditions G(D)H T (D) = 0 and GH T = 0 give the same system (38).

Example 3.
For the code C from Example 2, we write H 0 = (a, c) and H 1 = (b, d). Using G 0 , G 1 from (22) and solving the system (38), we obtain H 0 = (α, 1) and H 1 = (1, α). Hence, H(D) = (α + D, 1 + αD) and a parity check matrix H of the code C, which is a generator matrix for the dual code C ⊥ , is as follows:

Trellises of Dual Codes
Similar to fixed convolutional codes, we have the following theorem: Theorem 5. For a skew convolutional code C and its dual C ⊥ , we have deg C = degC ⊥ .
Proof. Denote by τ and τ ⊥ periods of the codes C and C ⊥ , respectively. Let be the least common multiple of periods τ and τ ⊥ ; then, the -blocked codes C ( ) and (C ⊥ ) ( ) are both fixed convolutional codes. The fixed codes C ( ) and (C ⊥ ) ( ) are dual to each other, since blocking does not change code sequences, hence deg C ( ) = deg(C ⊥ ) ( ) , see, e.g., Theorem 2.69, [20] for fixed dual convolutional codes. From (32), we have deg It follows from Theorem 5 that the number of states at one level of the code trellis (trellis complexity) is the same for an original code C and for its dual C ⊥ and equals Q deg C .
The trellis of the dual code C ⊥ obtained from the matrix H in Example 3 is shown in Figure 4. The trellis has Q deg C = 4 1 = 4 states labeled by elements of the set S = {0, 1, α, α 2 }. Every word of the dual code C ⊥ is represented by a path in the trellis that starts from a state s −1 ∈ S and goes to the right. For the trellis section corresponding to time t = 0, 1, . . . , the edge connecting states s t−1 and s t are labeled by v ⊥ t computed as follows: v ⊥ t = s t−1 (α 2 , 1) + s t (1, α 2 ) for odd t, s t−1 (α, 1) + s t (1, α) for even t.

Trellis Decoding of Skew Convolutional Codes
For a given skew convolutional code C, we showed how to obtain a code trellis using a generator matrix of the code. Another way to obtain a code trellis of C using a parity check matrix H was proposed in [27]. Having a code trellis, one can use the Viterbi decoder [4] for maximum likelihood sequence decoding or the BCJR decoder [28] for symbol-wise decoding.
For an [n, k] skew convolutional code, the complexity of the Viterbi decoder has order κ = nQ k Q deg C operations (additions and binary selections), which exponentially increases in k and might be high for high rate codes. Using detailed code trellis [27,29], where every edge is labeled by a single field element, the decoding complexity can be reduced to Another advantage of the method in [29] is that it can be applied to every trellis section separately, which is convenient for time-varying codes. The decoding complexity of a particular code can also be decreased using methods in [30]. The complexity of the BCJR decoding algorithm has the same order as in (39) as well. Symbol-wise decoding of a skew convolutional code C can be implemented using a trellis of the dual code C ⊥ , see [31][32][33]. The order of decoding complexity in this case is also given in (39).

Conclusions
A new class of non-binary skew convolutional codes was defined that extends the class of fixed convolutional codes. The skew convolutional codes are equivalent to periodic time-varying classical convolutional codes but have as compact a description as fixed convolutional codes.
Given a field F = F p M = F q m of characteristic p and code parameters n, k and µ; for every authomorphism θ(a) = q a of the field, the subclass SCC(θ) of skew convolutional [n, k] codes of memory µ over the field is defined. All the subclasses have the same number of codes. In case of the identity automorphism θ = id, we obtain the subclass SCC(id) of classical fixed convolutional codes. Any other automorphism θ of the field gives a subclass SCC(θ) of skew convolutional codes that can be represented as a periodic time-varying convolutional code with typical period m . The total number of the subclasses SCC(θ) is equal to the number of divisors of M, which is usually not a large number. The class of m-periodic time-varying convolutional codes is larger than the class of skew convolutional codes. Every code in the subclass SCC(θ) is defined by a k × n polynomial generator matrix G(D) over the ring of θ-skew polynomials; hence, the descriptions of skew codes and fixed codes are the same, and the description is given by the same matrix G(D).
Every τ-periodic convolutional [n, k] code can be written as a fixed [τn, τk] code; hence, skew convolutional codes can be analyzed by methods known for fixed codes. We showed how to design generator and parity check matrices in polynomial and scalar forms, encoders and code trellises for skew convolutional codes, and their duals. Using code trellises for original and dual codes, in the case of channels without memory, one can apply Viterbi or BCJR decoding algorithms, or the dualized BCJR algorithm.
Future work. We gave just a first encounter with skew convolutional codes. There are many open problems remaining. The algebraic structure of classical fixed convolutional codes is well understood, see, e.g., [20,21] and references therein. The questions such as how to obtain a canonical generator matrix of a skew convolutional code and its dual, or how to design encoders of a fractional generator matrix can be considered in the future. Another open problem is to find good skew convolutional codes reaching an upper bound on the free distance. One possibility to obtain skew convolutional codes is based on unwrapping skew quasi-cyclic (QC) block codes (see such codes in [17]) in a way similar to [34] or [35], where it is shown how fixed classical convolutional codes can be obtained by unwrapping QC block codes and vice versa.