Neural net decoders of binary codes

In the article, neural net decoders for binary error-correcting codes are constructed. Analytical methods for calculating of synapse weights are proposed. Algorithms of neural net coding and decoding are considered on example Viterbi convolutional code, LDPC and Reed-Muller code.


Introduction
For the first time, the idea of using neural network computing [1] to build error correction codes [2] was proposed in the 90s of the XX century [3][4][5]. In these papers, codes of small dimensions (n, k) were considered and synaptic weight of neural net decoders was calculated analytically. At the same time, it was immediately noticed that an increase in the size of the information word k leads to an increase in the dimension of the neural network in proportion to 2 k . This fact stopped the development of this direction for ten years. In 2002, J.Wu, Y.Theng and Y.Huang [6] proposed their own version of a 2-layer BCH decoder. To adjust synaptic weights of the code (n, k), they were solved by a system of k nonlinear equations in n variables using genetic algorithms. Moreover, the degree of equations in some cases reached n. Computer capacities allowed them to train decoders up to (n, k) = (21, 6) inclusive. Further studies were again stopped by the nonpolynomial dependence of the neural net dimension on the codeword length. The explosive increase in computer performance has recently allowed researchers to draw an attention to this task again. So, in the works [7][8][9][10] the soft decoders BCH(63,45), BCH(127,106) -BCH(127,64) are analyzed. To construct such decoders, neural net deep learning is used, the number of layers in which reaches 1200 [9].

Neural net encoder
Each layer of the direct propagation neural net is described by equation Here x is the input and y is the output vector of the layer, W is a matrix of weight coefficients, b is a bias, f (x) is an activation function.
Let a = (a 1 a 2 ...a k ) be an information word. If G is the generator matrix of the code, then the codeword c=Ga is formed by a single-layer feedforward neural net with W=G, b=0 and the activation function as f (z) = z mod 2 (figure 1).

Neural net Boolean decoder
One of the simplest neural net decoder can be constructed on the base of Boolean functions. For code (n, k), the input sequence is a Boolean n-dimensional vector. The output sequence is a Boolean k-dimensional vector. Therefore, a Boolean decoder is a set of k Boolean functions of n variables ( n φ 1 , n φ 2 , ..., n φ k ). It is known that the number of all k-ary Boolean functions is equal to 2 2 k . We are interested only in those in which the number of 1 is equal to the number of 0. The simplest way to construct all necessary functions is to present them in a disjunctive normal form (DNF). After that, the task of neural net design of coders is reduced to the task of a neural net realization of DNF.
Recall that if a boolean function of n-variables has a unit in the truth table on (i 1 , i 2 , ..., i m ) places, then for DNF we have Here, I j = 1 − I j , (I p n−1 , I p n−2 , ..., I p 1 , I p 0 ) is a vector of binary representation of the number i p , i.e.
For any full DNF of boolean function φ(i 1 , i 2 , ..., i m ), we can construct a two-layer neural network. The first layer calculates all conjunctions. For each conjunction For each disjunction we get a perceptron with a separating hyperplane For both layers, we will use the Heaviside activation function f (z) = θ(z), θ(0) = 1. ) into codewords This code allows to correct only one error, so we will write out all valid and distorted code words that can be decoded For any input signal (column), the output is the information word. That is, in our neural network the input matrix M will transformed into an output matrix of corrected information words The first layer realize all conjunctions The second layer realize all disjunctions Consider the second row of the matrix A. This row corresponds to the boolean function φ 2 (13, 29, 5, 9, 15, 12, 27, 11, 19, 31, 25, 26 Then for first layer we have All coefficients of the second layer for φ 2 coincide with coefficients for φ 1 . Let a = (10) is information word. Suppose that during transmission of codeword m = (10110), an error occurred in the second symbol: m = (11110). The output of the neural net for the first bit is The output of the neural net for the second bit is Since the weights of the second layer are the same for all information symbols, they are combined in the figure 2.
The obvious direction of optimizing for the Boolean decoder is the use of alternative basis. This is due to the fact that a small formula on one basis can have an exponential size explosion on another basis, and vice versa. For example, for code x = (x 4 x 3 x 2 x 1 x 0 ) with information bits x 4 x 3 we have the Zhegalkin decoder in the form For reference, we present the Zhegalkin decoder for Hamming code (6.3) with x = (x 5 x 4 x 3 x 2 x 1 x 0 ):

Neural net G-decoder
If an error occurs, the valid codeword transforms into corrupt. All forbidden words that formed from permitted using the same number of errors lie at the same distance from the original codeword. In other words, every allowed codeword, together with own distorted codewords will form a cluster. Since all points of this cluster are located on the binary Hamming sphere, they belong to the same spherical segment, i.e. can be separated by cutting planes. It's obvious that the number of such separating planes for the code (n, k) is equal to 2 k . In other words, the simplest neural net decoder-classifier must contain at least one layer with 2 k neurons. We describe the algorithm for calculating the weighting coefficients this neural net. From the set of information words A = (a 1 a 2 ...a 2 k ), we create the 2 k cluster centers: M = GA T . Then the weights of the first layer are calculated by the formula If we take the activation function of the first layer in the form then at the output of the first layer, we get a vector that determines the position of the allowed codeword in the matrix A.
To highlight this word, we need a second layer with W 2 = A T and linear activation function f 2 (z) = z.  Since z max = max(W 1x ) = 10 corresponds to the 51st element of the vector W 1x , then output of the first layer has the form y 1 = f 1 (W 1x ) = (000...001 51 00...000 2 6 ).
This vector determines position (in our case the 51st) of the codeword in the matrix A. To restore information sequence we need a second layer, the output of which is y 2 = f 2 (W 2 y 1 ) = (010011) = a.

Neural net H-decoder
Recall that linear block code is an error-correcting code, for which codewords x forms a linear ndimensional space C. If A is k-dimensional space of information words a, then for it r-dimensional orthogonal subspace can be defined as H ∼ = A ⊥ , where H is the matrix which defining the basis of this subspace. It follows that for any x ∈ C there is a vector h ∈ H such that h = Hx. These arguments provide the base for building of the neural net decoder which works in the feature space H, i.e. in the space of syndromes with basis H. Obviously, for k ≪ r we have dimA ≪ dimH, i.e. maximum number of neurons (classes) for syndrome classifier is limited to 2 r . However, for recovery of information word with uses syndromes we need to introduce a few additional layers.
Consider the algorithm of an analytical learning of the neural net decoder that built on the base of the parity-check matrix H. For the first layer, we set W 1 = H, f 1 (z) = 2(z mod 2) − 1. The output of the first layer will give a binary representation of the syndrome that defining the error. To localize this error we will use the matrix of syndromes W 2 = 2(HW 3 ) T − J, where W 3 is a matrix of all possible error vectors that can be corrected. As activation functions of neurons for second and third layer we take f 2 (z) = θ(z − r) and f 3 (z) = z. So thus, the output of the third layer will be the error vector.
In the fourth layer we add modulo 2 the origin codeword to error vector. The neural net decoder will work as follows. The output of the first layer is a vector-syndrome in bipolar form At the output of the second layer, we obtain the vector which determines the position of the error vector in the matrix of syndromes W 2 . We describe an algorithm for constructing this matrix. Denote by W 3 the matrix whose columns are all possible errors that can be corrected:  maximum element "10" stands in 212st place, we need to take column 212 from matrix W 3 . It will coincide with the error vector for the received code combination: y 3 = W 3 y 2 = (01110000000000 15 ).
After adding the error vector to the origin codeword at the output of the fourth layer we get the corrected information word y 4 = f 4 ( W 4 · (x ⊕ y 3 ) ) = (10101).
Obviously, this neural net decoder is universal, because of as generator G well as a parity-check matrix H of any linear block systematic code can be used as matrix of layer weights W.
If we take into account that W 4 = ( I k 0 r,k ) and f 3 (z) = z is linear activation function, then this scheme can be reduced to three-layer (figure 4). Further modifications of the neural net decoder can be made for t = 1 error correction codes. Note that in this case, the matrix W 3 = I. In other words, the third layer can be combined with the second one and the result will be two-layer neural net H-decoder.