Combining the Burrows-Wheeler Transform and RCM-LDGM Codes for the Transmission of Sources with Memory at High Spectral Efficiencies

In this paper, we look at the problem of implementing high-throughput Joint Source- Channel (JSC) coding schemes for the transmission of binary sources with memory over AWGN channels. The sources are modeled either by a Markov chain (MC) or a hidden Markov model (HMM). We propose a coding scheme based on the Burrows-Wheeler Transform (BWT) and the parallel concatenation of Rate-Compatible Modulation and Low-Density Generator Matrix (RCM-LDGM) codes. The proposed scheme uses the BWT to convert the original source with memory into a set of independent non-uniform Discrete Memoryless (DMS) binary sources, which are then separately encoded, with optimal rates, using RCM-LDGM codes.


Introduction
When considering sources with memory, Shannon's JSC coding theorem states that reliable transmission is only possible if H(S)R ≤ C, where H is the entropy rate of the source in bits per source symbol, C is the capacity of the channel in information bits per channel use and R is the JSC code's rate (source symbols per channel use). A fundamental result of information theory is the Separation Theorem, which states that provided unbounded delay, no optimality is lost by designing the joint decoder as the concatenation of an optimal source code with compression rate H, and a capacity-achieving channel code of rate C. This independent design of source and channel codes allows for diverse sources to share the same digital media. Therefore, source coding and channel coding have traditionally been addressed independently of each other. Nevertheless, when complexity is an issue and the length of the input block is constrained, the overall performance can be improved by using a JSC coding scheme. In this case, the joint decoder employs the inherent redundancy of the uncompressed source [1]. The main approaches to JSC can be categorized as follows: • Ad hoc approaches where the channel encoder is applied to a given source compression format [2][3][4]. High-level information from the source code is used in the decoding process, which makes this approach highly dependent on the source encoder.
• A = [a ij ] is the state transition probability matrix of dimension λ × λ, with a ij the probability of transition from state S i to state S j , i.e., a i,j = P(q k+1 = S j |q k = S i ) for all k.
is the observation symbol probability matrix, with b j (v) the probability of getting in the binary symbol v in state S j , i.e., b j (v) = P(v|S j ), 1 π is the initial state distribution vector, with π j the probability for the initial state to be S j , i.e., π j = P(q 1 = S j ), 1 ≤ j ≤ λ.

Remark 1.
For stationary sources, π should be taken as the stationary distribution of the chain, i.e., π = Aπ.

Remark 2.
When matrix B has entries 0 and 1, the HMM reduces to a MC.

Burrows-Wheeler Transform (BWT)
The BWT [19] is a lexicographical permutation of the characters of a string such that the transformed sequence is easier to be compressed. It is obtained from the last column of an array whose rows are all cyclic shifts from the input in dictionary order, which tend to have long runs of identical characters. From this last string we can recover the entire array, making the BWT reversible. The BWT has been widely analyzed in [20][21][22] and employed for the general problem of data compression [23,24]. More recent contributions have focused on the applicability of the BWT to coded transmission of Markov sources through AWGN channels via LDPC [13] and non-systematic Turbo codes [14].
Let T = {T k } K k=1 , T k ∈ {0, 1} denote the output block of the reversible block-sorting BWT when its input is the block of binary source symbols {U k } K k=1 . For sources modeled by MCs with λ states, it was shown in [20] that the joint probability mass function, P T (t), of the random block T is approximately memoryless and piecewise stationary, in the sense that there exist λ index sets, L i = {w i−1 . . . w i }, i = 1, . . . , λ with w 0 = 1 and w λ = K + 1, and a probability distribution such that the normalized divergence between both distributions can be made arbitrarily small for sufficiently large K, i.e., As the block length K goes to infinity, the normalized length of the index set in expression (2) converges to c i ∈ R, i.e., lim K→∞

Definition 1.
Let T i denote the binary random sequence of length K i = c i K at the output of the BWT corresponding to the index set L i , i = 1, . . . , λ. That is, Observe from (2) that for large blocks of length K, the binary random symbols T k ∈ T i , with k ∈ L i , can be considered independent and identical distributed (i.i.d.), with probability distribution for some p 0 i ∈ (0, 1). These approximations should be understood under the convergence criterium (3). Therefore, we will model the non-stationary BWT output sequence T as the concatenation of λ blocks of length K i = c i K, i = 1, . . . , λ generated by λ independent DMS binary sources S 1 , S 2 . . . , S λ , with entropies By the independence of the sources and their symbols, the entropy rate of the original source can be expressed as

Parallel RCM-LDGM Codes
The N-length codeword of a parallel concatenation of RCM and LDGM x, is composed of M RCM coded symbols and I = N − M LDGM coded bits. Next, we provide a succinct overview of the constituent RCM and LDGM codes.

Rate-Compatible Modulation (RCM) Codes
RCM codes [25] are based on random projections which generate multilevel symbols from weighted linear combinations of the source binary symbols. More precisely, an RCM code of rate K/M is generated by an M × K sparse mapping matrix G. The non-zero entries of each row of G belong to a multiset ±D, with D ⊂ N, the set of natural numbers (positive integers). Given the binary source sequence u = {u 1 , u 2 , . . . , u K }, the RCM coded sequence c of length M is obtained as where these operations are in the real field. Finally, rate adaptation is achieved by adjusting the number of rows in G.

Low-Density Generator Matrix (LDGM) Codes
LDGM codes are a subclass of the well-known LDPC codes with the particularity that the generator matrix G L is also sparse. This allows the decoding algorithm to use the graph generated by G L . In this paper, we consider systematic LDGM codes, whose generator matrix is of the form G L = [I K |P], where I K is the identity matrix of size K and P is a regular K × I sparse matrix with d where u = {u 1 , u 2 , . . . , u K } is the binary source sequence to be transmitted and the operations are in the binary field. Unlike general LDPC codes, LDGM codes suffer from high error floor [26]. However, it has been shown that they can help to lower the error floor of other codes as explained next.

Parallel RCM-LDGM Code
Consider an RCM code of rate K/M generated by a matrix G, and the non-systematic part of a high rate binary regular LDGM of rate K/I, generated by P. Then, the parallel RCM-LDGM coded sequence x of length M + I is given by where the last I symbols are encoded using a BPSK modulator. Recall that the objective of the LDGM code is to correct the residual error of the RCM code, lowering the error floor but without degrading the RCM waterfall region. Finally, the coded symbols of x are grouped two by two and transmitted using a QAM modulator, so that the spectral efficiency, ρ, is binary source symbols per complex channel use.
The performance of RCM-LDGM codes when encoding uniform and non-uniform DMSs can be found in [17,18]. An efficient way to design these codes was shown in [27]. However, no results have been found in the literature regarding the use of parallel RCM-LDGM codes to encode discrete binary sources with memory. The conventional approach in this situation would be to encode the correlated source symbols at the transmitter by the RCM-LDGM encoder, and to modify the decoder at the receiver to exploit the correlation of the source. This may be done by incorporating the factor graph that models the source into the factor graph of the RCM-LDGM code, and running the sum-product algorithm [28] over the whole factor graph represented in Figure 1. We will denote this approach as NON-BWT-JSC, and we will compare it with our proposed coding scheme defined in the next section.

Proposed BTW-JSC Scheme
The main idea behind the proposed BWT-JSC scheme is to transform the original source with memory S, into a set of λ independent non-uniform memoryless binary sources. This is accomplished by partitioning the source sequence into blocks of length K, U (l) = {U l·K+k } K k=1 , l ∈ N, and then applying the BWT to each of these blocks. The corresponding output segment i, inside output block l, is given by Observe that the sequence blocks T (l) i , i = 0, 1 . . . , λ can be considered to have been generated by a non-uniform DMS with entropy H i , i = 1, 2, . . . , λ. Therefore, we have reduced the encoding problem of sources with memory to a simpler one, namely the problem of JSC coding of non-uniform memoryless binary sources, with entropies H i . Notice that the previously mentioned RCM-LDGM high-throughput, JSC codes for non-uniform DMS sources [17], can now be applied to each of the λ independent sources as shown in Figure 2.
More concretely, let us consider a source with memory, S, and with entropy rate H(S), which generates blocks of K binary symbols to be transmitted at rate R = K/N by the parallel JSC coding system of Figure 2. Let T i (refer to Definition 1) be the input sequence to the corresponding i-JSC code of rate the set of signal-to-noise ratios allocated to each parallel channel. Define by the average SNR over all parallel channels. The following Theorem proves that the proposed scheme achieves the Shannon limit.
Theorem 1. Given a target rate R, the minimum overall SNR in the coding scheme of Figure 2 is achieved when all the SNR i 's take the same value, given by the SNR Shannon limit from expression (1), i.e., SNR * i = 2 RH(S ) − 1. The individual rates R i are given by R i = RH(S ) Proof. Given a set of signal-to-noise ratios {SNR i } λ i=1 , the rates of the JSC encoders in Figure 2 are given by the Shannon's separation theorem as where by the BWT hypothesis, K = ∑ λ i=1 K i . We seek to minimize the average signal-to-noise ratio SNR over all the λ parallel AWGN channels, i.e., under the constraint of achieving a rate Please note that since K = ∑ λ i=1 K i is fixed, the constraint in R reduces to the constraint By applying the Lagrange multipliers method, we define F as and by searching for an extreme of F, we obtain that the optimal SNR * i are all equal to some value Γ. Therefore, from constraint (6) where the last equality follows from expression (5). Thus, the rate can be written as Consequently, the value of Γ is given by the signal-to-noise ratio required to achieve the same rate R in the standard point-to-point communications system. That is, Remark 3. Observe that the BWT-JSC is asymptotically optimal in the sense that can achieve the SNR Shannon limit given by the Separation Theorem.

Results
In this section, we evaluate the proposed scheme, comparing its performance with the conventional NON-BWT-JSC approach described in Section 2.3, which is based on a single code. Without any loss of generality, the spectral efficiency of the communication system has been set to 7.4 binary source symbols per complex channel use, and the source block length to K = 37,000. Thus, the total number of coded symbols at the output of the JSC encoder is N = 10,000. We begin by specifying the Markov sources used in the simulations.

Simulated Sources and Their Output Probability Profile
Three different 2-state (λ = 2) Markov sources have been chosen. Two are modeled by MCs, with entropy rates 0.57 and 0.80 bits per source symbol, whereas the third is modeled by a HMM with entropy rate 0.73. For the sake of notation, they will be referred as S 1 , S 2 and S 3 . Table 1 summarizes their corresponding Markov parameters.   Figure 3 shows the probability mass function P T (t) (refer to (4)) of the binary random block T of length K = 37,000 at the output of the BWT for sources S 1 , S 2 , and S 3 . Observe that due to the fact that sources S 1 and S 2 follow a 2-state MC behavior, the BWT will produce approximately two i.i.d. segment T 1 and T 2 . This is clearly shown in Figure 3a,b, with segments of length (K 1 = 9020, K 2 = 27,980) with first order probabilities (p On the contrary, the source S 3 is characterized by a 2-state hidden Markov model, and the hidden property has the effect of increasing the number of states, should the HMM source be approximated by a pure MC. This is observed in Figure 3c, where a 6-state MC source will fairly approximate the statistics of source S 3 . The partition into 6 segments has been decided by the authors based on significant change in the a priori probability of the bits forming the segments. In this case, the first order probabilities of segments T 1 − T 6 of sizes (K 1 = 9250, K 2 = 5250, K 3 = 3000, K 4 = 2500, K 5 = 1500, K 6 = 15,500) are given by p

Numerical Results
In this section, we present the results obtained by Monte Carlo simulation for the proposed BWT-JSC and the conventional NON-BWT-JSC coding schemes. Observe that due to the BWT block, in our proposed scheme a single error at the output of the decoders will be propagated after applying the inverse-BWT. Therefore, to make a fair comparison, the results are presented in the form of Packet Error Rate (PER) versus SNR. It should be mentioned that for the correct recovery of the original transmitted source block, the inverse-BWT at the receiver side needs to know the exact position where the original End of File symbol has been moved by the BWT at the transmitted side. Therefore, this additional information should also be transmitted. Please note that for a 37,000 block length, this position can be addressed by adding 16 binary symbols. In this work, we have considered this rate loss as negligible, but in real scenarios it must be taken into account. Figure 4 shows the PER vs SNR curves obtained by simulations for the example sources (a) S 1 , (b) S 2 and (c) S 3 when using both the proposed system (BWT-JSC) and the conventional approach (NON-BWT-JSC) as a reference. In the proposed scheme, as stated in Section 3, after performing the BWT, each of the resulting λ independent non-uniform i.i.d. segments T i (p (i) Figure 3), are encoded by λ separated RCM-LDGM JSC codes of rates R i as given by Theorem 1. The codes used for each DMS in the BWT-JSC approach, as well as the one used in the conventional NON-BWT-JSC scheme are summarized in Table 2. Observe from Figure 4a,b that for sources S 1 and S 2 , represented by a MC, our BWT-JSC scheme outperforms the NON-BWT-JSC approach by about 4.2 and 2.3 dB's, respectively. The reason behind this improvement lies in the fact that in the NON-BWT-JSC system, the Factor Graph (FG) of the decoder, results from a parallel concatenation of two sub-graphs: The RCM-LDGM code and MC source sub-graphs (refer to Figure 1). Consequently, in the overall FG decoder cycles between both sub-graphs appear, degrading in this way the performance of sum-product algorithm. However, in the proposed scheme, these cycles do not occur since in this case the sources are memoryless and non-uniform. The contribution of the sources sub-graphs is just to introduce the a priori probabilities of the non-uniform sources into the variable nodes of the corresponding RCM-LDGM factor sub-graphs.
Let us now consider the HMM source S 3 with entropy rate H(S 3 ) = 0.73 and output probability profile as shown in Figure 3c. Note from this figure that the BW transformation of source S 3 can be approximated by 6 memoryless non-uniform sources {T i } 6 i=1 , with blocks of lengths K 1 ≈ 9250, K 2 ≈ 5250, K 3 ≈ 3000, K 4 ≈ 2500, K 5 ≈ 1500, K 6 ≈ 15,500. Some of these blocks have short lengths, which is detrimental for the performance of the corresponding RCM-LDGM codes. To solve this problem, we build larger segments T i that keep the same statistical properties as previous segments. In this approach, named BWT-JSC-κ, we put together κ consecutive output blocks of the BWT to form the new segments as T . . , λ and l ∈ N. This is, in fact, similar to applying the BWT to source blocks of length κ · K, but computationally it is more efficient. The RCM-LDGM codes used to the transmit these segments have the same rate as before, but in this case their input and output block lengths are scaled by κ, i.e., K i = κ · K i , M i = κ · M i and I i = κ · I i , i = 0, 1, . . . , λ, respectively. As before, Figure 4c plots the PER versus SNR curves for both strategies BWT-JSC (solid curves) and NON-BWT-JSC (dashed curves). When plotting the performance of the BWT-JSC-κ approach, two different cases have been considered, κ = 1 and κ = 6. Please note that when κ = 1 the scheme is the same as in previous MC examples. On the other hand, by concatenating 6 consecutive BWT output segments (κ = 6), we force the length of smallest segment to be 9000. Notice that for κ = 6 the proposed scheme outperforms the conventional approach by 2.3 dB. However, due to the bad performance of the short block-length RCM-LDGM codes, when κ = 1 the performance is similar to that of the conventional approach. This clearly shows that by concatenating BWT segments the system performance improves thanks to the avoidance of blocks with short lengths.
As summarized in Table 3, the proposed scheme clearly outperforms the conventional approach, and the PER vs SNR curves are only about 3 dB away from the Shannon limits.

Conclusions
A new source-controlled coding scheme for high-throughput transmission of binary sources with memory over AWGN channels has been proposed. The proposed strategy is based on the concatenation of the BWT with rate-compatible RCM-LDGM codes. The BWT transforms the original source with memory into a set of independent non-uniform discrete memoryless binary sources, which are then separately encoded, with optimal rates, using RCM-LDGM codes. Simulations show that the proposed scheme outperforms the traditional strategy of using the FG of the source in the decoding process by up to 4.2 dB for a spectral efficiency of 7.4 binary source symbols per complex channel use and a source with entropy rate 0.57 bits per source symbol. The resulting performance lies within 3 dB of the Shannon limit.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: