Asymmetric Joint Source-Channel Coding for Correlated Sources with Blind HMM Estimation at the Receiver

We consider the case of two correlated sources, S 1 and S 2 . The correlation between them has memory, and it is modelled by a hidden Markov chain. The paper studies the problem of reliable communication of the information sent by the source S 1 over an additive white Gaussian noise (AWGN) channel when the output of the other source S 2 is available as side information at the receiver. We assume that the receiver has no a priori knowledge of the correlation statistics between the sources. In particular, we propose the use of a turbo code for joint source-channel coding of the source S 1 . The joint decoder uses an iterative scheme where the unknown parameters of the correlation model are estimated jointly within the decoding process. It is shown that reliable communication is possible at signal-to-noise ratios close to the theoretical limits set by the combination of Shannon and Slepian-Wolf theorems.


INTRODUCTION
Communication networks are multiuser communication systems.Therefore, their performance is best understood when viewed as resource sharing systems.In the particular centralized scenario where several users intend to send their data to a common destination (e.g., an access point in a wireless local area network), the receiver may exploit the existing correlation among the transmitters, either to reduce power consumption or gain immunity against noise.In this context, we consider the system shown in Figure 1.The output of two correlated binary sources {X k , Y k } ∞ k=1 are separately encoded, and the encoded sequences are sent through two different This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.channels to a joint decoder.The only requirement imposed on the random process {X k , Y k } ∞ k=1 is to be ergodic.Notice that this includes the situation where the process {X k , Y k } ∞ k=1 is modelled by a hidden Markov model (HMM); this is the case analyzed in this paper.
If the channels are noiseless, the problem is reduced to one of distributed data compression.The Slepian-Wolf theorem [1] (proven to be extensible to ergodic sources in [2]) states that the achievable compression region (see Figure 2) is given by where R 1 and R 2 are the compression rates for sources S 1 S 1 X 1 , . . ., X M Encoder 1 and S 2 (bits per source symbol), and their respective conditional and joint entropy rates.In the particular case where the joint sequence {X k , Y k } ∞ k=1 is i.i.d., the above entropy rates are replaced by their corresponding entropies.
As already mentioned, we assume that the output of the multiterminal source {X k , Y k } ∞ k=1 can be modelled by a HMM, and we analyze a more general problem of reliable communication when channels 1 and 2 in Figure 1 are additive white Gaussian noise (AWGN) and noiseless, respectively.The main goal is to minimize the energy per information bit E b sent by the source S 1 for a given encoding rate R 1 < 1 and binary phase-shift keying (BPSK) modulation (i.e., the system operates in the power-limited regime).When the complexity of both encoder and decoder is not an issue, the minimum theoretical limit (E b /N 0 ) * is achieved when the source S 1 is compressed at its minimum rate, namely, H (S 1 | S 2 ).This can be done if the compression rate R 2 of the source S 2 is greater than or equal H (S 2 ) (marked point in Figure 2).Without any loss of generality, we can assume that the source S 2 is available as side information at the decoder (R 2 = H (S 2 )).
From the source-channel separation theorem with side information [3], the limit ( is the capacity of the AWGN channel in bits per channel use. 1 The above condition yields Referring to Figure 1, the encoder 1 has been implemented using a binary turbo encoder [4] with coding rate R 1 . 1 Since the modulation scheme used is BPSK, the capacity of the constrained AWGN channel with a binary input constellation should be used instead of the unconstrained channel capacity.However, since the system operates in the power-limited regim the difference between both capacities is small.
shows the asymmetric compression pair selected in our system.
However, with the corresponding decoding modifications, other type of probabilistic channel codes could have been employed, for example, low-density parity-check (LDPC) codes.The joint decoder bases its decision on both the output of the channel V k and the side information Z k = Y k coming from the source S 2 .
The first practical scheme of distributed source compression exploiting the potential of the Slepian-Wolf theorem was introduced by Pradhan and Ramchandran [5].They focused on the asymmetric case of compression of a source with side information at the decoder and explored the use of simple channel codes like linear block and trellis codes.If this asymmetric compression pair can be reached, the other corner point of the Slepian-Wolf rate region can be approached by swapping the roles of both sources and any point between these two corner points can be realized by time sharing.For that reason, most of the recent works reported in the literature regarding distributed noiseless data compression consider the asymmetric coding problem, although they use more powerful codes such as turbo [6,7] and LDPC [8,9] schemes.An exception is [10] that deals with symmetric source compression.In all the above references, except in [9], the correlation between the sources is very simple because they assume that this correlation does not have memory In [10], the correlation parameter p is estimated iteratively.However, Garcia-Frias and Zhong in [9] consider a much more general model with hidden Markov correlation and assumes that its parameters are known at the decoder.
When one of the channels is noisy, the authors in [11] (for a binary symmetric channel, BSC) and in [12] (for a BSC, AWGN and Rayleigh channel) have proposed a joint source-channel coding scheme based on turbo and irregular repeat accumulate (IRA) codes, respectively.In both cases, the correlation among the sources is again assumed to be memoryless and known at the receiver.Under the same correlation assumptions, the case of symmetric joint source-channel coding when both channels are noisy (AWGN) has been studied using turbo [13] and low-density generator-matrix (LDGM) [14] codes.Both assume that the memoryless correlation probability is known at the decoder.
In this paper, we take a further step and consider that the correlation between the sources follows a hidden Markov model like the correlation proposed in [9] for distributed source compression.However, unlike what is assumed in [9], our proposed scheme does not require any previous knowledge of the HMM parameters.It is based on an iterative scheme that jointly estimates, within the turbo-decoding process, the parameters of the HMM correlation model.It is an extension of the estimation method presented by Garcia-Frias and Villasenor [15] (for point-to-point data transmission over an AWGN of a single HMM source) to the mentioned distributed joint source-channel coding scenario.As we show in the simulation results, the loss in BER performance that results from the blind estimation of the HMM parameters when compared to their perfect knowledge is negligible.
The rest of this paper is organized as follows.In the next section, the proposed system is introduced and the iterative source-channel joint decoder is described.Section 3 discusses the simulation results of the joint decoding scheme.Finally, in Section 4, some concluding remarks are given.

SYSTEM MODEL
In this section, we present the proposed joint source-channel encoder shown in Figure 3.It uses an iterative decoding scheme that exploits the hidden Markov correlation between sources based on the side information available at the decoder.After describing the model assumed for the correlated sources, the encoding and decoding process is analyzed.We place a special emphasis on the description of the iterative decoding algorithm by means of factor graphs and the sumproduct algorithm (SPA).For an overview about graphical models and the SPA, we refer to [16].

Joint source model
We assume the following model for the multiterminal source , where denotes modulus 2 addition, and E k is a binary random process generated by an HMM with parameters {A, B, Π}.The model is characterized by [17] (1) the number of states P; P S MS 0 (s) and s ∈ {0, . . ., P − 1}.We may note that for this model, the outputs of both sources S 1 and S 2 are i.i.d. and equiprobable.Thus, H(S 1 ) = H(X 1 ) = 1 and H(S 2 ) = H(Y 1 ) = 1.On the contrary, the correlation between sources does have memory since where H (E) denotes the entropy rate of the random sequence E k generated by the HMM.By changing the parameters of the HMM, different values of H(S 1 | S 2 ) can be obtained.Also notice that, for the particular case where P = 1, the correlation is memoryless, resulting in H(S 1 | S 2 ) = H(E 1 ) = h(b 0,1 ); that is, the entropy of a binary random variable with distribution (b 0,1 , 1 − b 0,1 ).Using the fact that Y k = X k ⊕ E k , the above model can be reduced to an equivalent HMM that outputs directly the joint sequence {X k , Y k } ∞ k=1 without any reference to the variable E k .Its trellis diagram has P states and 4 parallel branches between states, one for each possible output (X k , Y k ) combination (see Figure 4).The associated branch a priori probabilities are easily obtained from the original HMM model and the X k a priori probabilities P(x k ).For instance, the branch probability of going from state s to state s , associated with outputs X k = q and Y k = v, q = v (q = v), is given by the probability of the following three independent events {S k−1 = s, S k = s }, {E k = 1, when being in state s} ({E k = 0, when being in state s}), and where q, v ∈ {0, 1} and s, s ∈ {0, . . ., P − 1}.The MS label for the trellis branch transitions T MS k and state variables S MS k stands for multiterminal source.

Turbo encoder
The block sequence {x k } M k=1 = {X 1 = x 1 , . . ., X M = x M } produced by a realization of the source S 1 is first randomized by the interleaver τ before entering to a turbo code, with two identical constituent convolutional encoders C 1 and C 2 .The encoded binary sequence is denoted by , where we assume that the coding rate is R 1 = 1/3, and r τ(k) , z τ(k) are the redundant symbols produced by C 1 , C 2 , respectively.The input to the AWGN channel is , where φ : {0, 1} → R denotes the BPSK transformation performed by the modulator.Finally, the received corresponding sequence will be denoted by

Joint source-channel decoder
To better understand the joint source-channel decoder with side information, we begin analyzing a simplified decoder that bases its decisions only on (i) the received systematic symbols { x k } M k=1 ; (ii) the side information sequence { y k } M k=1 generated by a realization of the source S 2 .
The decoder will decide for the X k ∈ {0, 1} that maximizes the a posteriori probability P(x k | { x j , y j } M j=1 ) (MAP decoder).This is done via the forward-backward algorithm, also known as MAP or BCJR [18].This algorithm is a particularization of the SPA applied to factor graphs derived from an HMM or a trellis diagram, and it is an efficient marginalization procedure based on message-passing rules among the nodes in a factor graph.
From the trellis description of our source model (see Figure 4), the joint probability distribution function of the random variables {X k } M k=1 conditioned by the observations { x j } M j=1 and the side information { y j } M j=1 , that is, ), can be decomposed in terms of factors, one for each time instant k.In turn, this factorization may be represented by a factor graph [16], like the one shown in Figure 5.We keep the same convention used in [16], representing in lower case the variables involved in a factor graph.There should be no confusion from the context whether x denotes an ordinary variable taking on values in some finite alphabet X, or the realization of some random variable X.
Since the channel is AWGN, the local functions of x k , P( x k | x k ), are given by the Gaussian distribution N (φ(x k ), N 0 /2).On the other hand, the local functions I yk (y k ) are indicator functions taking value 1 when y k = y k and 0 otherwise.This shows the fact that the output of the source S 2 is known with certainty at the decoder.
Based on this factor graph, the decoder can now efficiently compute the a posteriori probability P(x k | { x j , y j } M j=1 ) by marginalizing P(x 1 , . . ., x M | { x j , y j } M j=1 ) via the SPA which, in this case, reduces to the forward-backward algorithm.
In particular, the forward and backward recursion parameters α MS k−1 (s MS k−1 ) and β MS k (s MS k ) defined in the forwardbackward algorithm are the messages passed from the state variable node s MS k−1 to the factor node T MS k and from the state variable node s MS k to T MS k , respectively.From the sumproduct update rules, the following expressions are obtained for these messages: where ) to the variable nodes x k , are obtained by the SPA update rules as The a posteriori probability P(x k | { x j , y j } M j=1 ) is now calculated as the product of all the messages arriving at variable node x k .In our case, the message passed from the local function node P( x k | x k ) to the variable node x k is simply the probability function itself, whereas the message passed from the local function node 5).Therefore, The problem we want to solve in this paper is an extension of what we have just analyzed.The joint decoder must compute the a posteriori probability of the symbol X k by observing not only the corresponding received symbols { x j } M j=1 and the side information { y j } M j=1 as described before, but also the additional outputs of the channel { r j } M j=1 and { z j } M j=1 , that is, P(x k | { x j , r j , z j , y j } M j=1 ).The global factor graph results by properly attaching, through interleaver τ, the factor graph describing a standard turbo decoder to the graph in Figure 5.
Figure 6 shows this arrangement.Observe that the three sub-factor graphs have the same topology since each models a trellis (with different parameters); namely, the trellis of the two constituent convolutional decoders and the trellis of the multiterminal source.
Similarly to what happens with the standard factor graph of a turbo decoder, the compound factor graph has cycles and the message sum-product algorithm has no natural termination.To overcome this problem, the following schedule has been adopted.During the ith iteration, a standard SPA is separately applied to each of the three factor graphs describing the decoders D1, D2, and the multiterminal source, in this order: MS → D1 → D2.Since these subfactor graphs do not have cycles, the corresponding SPAs will terminate.Notice, however, that the updating rules for the SPA, when applied to one of the subfactor graphs, require incoming messages from the other two subfactor graphs (called extrinsic information in turbo-decoding jargon), since all share the same variable nodes x τ(k) .The messages computed in the previous steps are used for that purpose.
For example, referring to Figure 6, the former SPA update expressions (see ( 6)-( 8)) are now modified to include the extrinsic information ξ MS k,i (x k ) coming from D1 and D2 (i.e., from the turbo-decoding iteration), instead of P( where the subindex i denotes the current iteration.The extrinsic information ξ MS k,i (x k ) is the message passed from the variable node x k to the factor node T MS k through interleaver τ (see Figure 6).Using the SPA update rules, this is given by ) With the obvious modifications, the same set of recursions also holds for the factor graphs D1 and D2.Observe that the SPA applied to D1 and D2 is nothing more than the standard turbo-decoding procedure modified to include the extrinsic information δ MS k,i−1 (x k ) coming from the MS.After L iterations, the a posteriori probabilities P(x τ(k) | { x j , r j , z j , y j } M j=1 ) are calculated as the product of all messages arriving at variable node x τ(k) , that is, Finally, the estimated source symbol at τ(k) is given by arg max xτ(k)∈{0,1} P(x τ(k) | { x j , r j , z j , y j } M j=1 ).
If the local functions I yk (y k ) in the factor nodes of Figure 6 were substituted by P(y k ) = 0.5 (i.e., if no side information was available at the decoder or the sources were not correlated), the resulting normalized messages from the SPA would be δ MS k,i (x k ) = 0.5 for all k, i and all values of variables x k (showing the fact that the source S 1 is i.i.d. and equiprobable).In other words, the subfactor graph of the MS would be superfluous and the decoder would be reduced to a standard turbo decoder.Should we assume for S 1 (see Figure 3) a two-state HMM source, like the one considered in [15] instead of i.i.d., the resulting MS overall HMM, combining both HMM models (for {E k } and {X k }), would have 2P states with 4 branches between states.The corresponding branch probabilities in ( 5) would have to be modified accordingly.In the lack of side information, the MS factor graph would be reduced to that describing the HMM of the source S 1 .As a result, our decoding process would coincide with the scheme studied in [15].

Iterative estimation of the HMM parameters of the multiterminal source model
The updating equations ( 10)-( 12) require the knowledge of the HMM parameters {A, B, Π}, since they appear in the definition of the branch transition probabilities in (5).However, in most cases, this information is not available.Therefore, the joint decoder must additionally estimate these parameters.The proposed estimation method is based on a modification of the iterative Baum-Welch algorithm (BWA) [17], which was first applied in [15] to estimate the parameters of hidden Markov source in a point-to-point transmission scenario.The underlying idea is to use the BWA over the trellis associated with the multiterminal source by reusing the SPA messages computed at each iteration.
For the derivation of the reestimation formulas, it is convenient to define the functions a i (s, s ), b i (s, e), and π i (s), where s, s , and e are variables taking on values in {0, . . ., P − 1} and {0, 1}, respectively.The index i denotes the iteration number and the values taken by these functions at iteration i are the reestimated distributions of the probability of going from state s to state s , the probability that the HMM outputs the symbol e when being in state s, and the probability that the initial state of the HMM is s, respectively.With this new notation, the local functions 5) in the MS factor graph will now depend on i, yielding Notice that the variable e k is explicitly included in the argument of T MS k,i since the access to this variable is required when obtaining the reestimation formula for b i (s, e) (17).
Having said that, the reestimation expressions for these functions are easily derived by realizing that the conditional probability Using this fact on the BWA, the following reestimation equations are obtained: The ∼{∅} in the denominator of (18) indicates that all variables are summed over.At iteration i, the above expressions are computed after the SPA has been applied to MS, D1, and D2.We have noticed that (18) may be omitted whenever the block length is large enough (the initial α MS 0,i ( j) can be set to 1/P for all j ∈ {0, . . ., P − 1}).We now give a brief summary of the proposed iterative decoding scheme.
(1) Perform the SPA over the factor graphs that describe the decoders D1 and D2 without considering the extrinsic information coming from the MS block (i.e., with δ MS k,0 (x k ) = 0.5, for all k ∈ {1, . . ., M}).For each k, obtain an initial estimate x k of the source symbol x k by x k = arg max xk∈{0,1} P(x k | { x j , r j , z j } M j=1 ).Notice that this is equivalent to considering only the turbo decoder.
(4) Perform the SPA over the MS factor graph using the functions T MS k,i in (15) as factor nodes.This will produce the set of messages δ MS k,i (x k ).( 5) Perform the SPA over the factor graphs D1 and D2 with messages δ MS k,i (x k ) as extrinsic information coming from the factor graph MS. (6) Reestimate the HMM parameters using ( 16)- (18), and go back to step 3.

SIMULATION RESULTS
In order to assess the performance of the proposed joint decoding/estimation scheme, a simulation has been carried out using different values of the conditional entropy rate H (S 1 | S 2 ).The two constituent convolutional encoders C 1 and C 2 of the turbo code are characterized by the polynomial generator g(Z) = [1, (Z 3 + Z 2 + Z + 1)/(Z 3 + Z 2 + 1)].In all simulated cases, the number of states P for the HMM characterizing the joint source correlation has been set to 2. Performance comparisons with and without the decoder having a priori knowledge of the hidden Markov parameters are presented.
The simulation uses 2000 blocks of 16384 binary symbols each, and the maximum number of iterations is fixed to 35. Figure 7 displays the bit error ratio (BER) versus E b /N 0 for two different values of the conditional entropy rate, H (S 1 | S 2 ) = 0.45 and 0.73, and for the rate 1/3 standard turbo decoder.The HMM model that generates the stationary random process E k , giving raise to H (S 1 | S 2 ) = 0.45 (0.73), has transition probabilities a 0,0 = 0.97 (0.9), a 1,1 = 0.98 (0.85) and output probabilities b 0,0 = 0.05 (0.05), b 1,0 = 0.95 (0.92).In both cases, the initial-state distribution Π is the corresponding stationary distribution of the chain.
As opposed to what happens to the joint probability distribution of (E 1 , . . ., E n ), the marginal distribution P Ek (e k ) is easily computed by P Ek (e k ) = π 1 • b 1,ek + π 0 • b 0,ek , for all k.It can be checked that in both models this distribution is nearly equiprobable, giving a value for the entropy that is, the random variables X k and Y k are practically independent.Therefore, the correlation between the processes The standard turbo-decoder curve has been included in Figure 7 for reference.It shows the performance degradation that the proposed joint decoder would incur, should the side information not be used in the decoding algorithm (or, equivalently, if no correlation exists between both sources, i.e., H(S 1 | S 2 ) = H(S 1 ) = 1).
For comparison purposes, the three theoretical limits −0.55, −2.2, and −4.6 dB given in (3) corresponding to H(S 1 | S 2 ) = 1, 0.73, and 0.45, respectively, are also shown as vertical lines in Figure 7.For H(S 1 | S 2 ) = 0.73 and H (S 1 | S 2 ) = 0.45, the BER curves with markers represent the performance when perfect knowledge of the joint source parameters is available at the decoder.On the other hand, the curves with display the performance when no initial knowledge is available at the joint decoder.In this case, the estimation of the HMM parameters is run afresh for each input block, that is, without relying on any previous reestimation information.
Observe that the degradation in performance due to the lack of a priori knowledge in the source correlation statistics is negligible.Also we may note that at a given BER, the gap between the required E b /N 0 and their corresponding theoretical limits widens as the conditional entropy rate decreases (i.e., the amount of correlation between sources increases).
In particular, at BER = 10 −4 , the gaps are 0.65, 1 and 2.4 dB, respectively.As mentioned in [13] for the memoryless case, when the correlation between the sequences is very strong the side information can be interpreted as an additional systematic output of the turbo decoder.As it is well known in the turbo-code literature, this repetition involves a penalty in performance.
The set of curves in Figure 8 illustrates the BER performance versus E b /N 0 as the number of iterations increases.Plots 8a and 8b are for the conditional entropy rates H (S 1 | S 2 ) = 0.45 and H (S 1 | S 2 ) = 0.73, respectively.Although the BER performance is similar in both cases, the convergence rate when the decoder estimates the parameters of the HMM is slower, as expected.
Finally, suppose that the joint decoder is implemented assuming that the correlation between sources is memoryless (like in [13]), that is, the state variables in the MS factor graph can only take a single value s k = 0, and the factor nodes T MS k in (5) have a 0,0 = 1 and b 0,0 = P Ek (0).As a result,  we would not achieve any performance improvement with respect to the case of no side information.As previously mentioned, the reason is that with this decoder, the rate compression for source S 1 would be limited to H(X k | Y k ) = H(E 1 ) ≈ H(X k ), implying that there is practically no correlation (of depth n = 1) between S 1 and S 2 .

CONCLUSIONS
Given two binary correlated sources with hidden Markov correlation, this paper proposes an asymmetric distributed joint source-channel coding scheme for the transmission of one of the sources over an AWGN.We assume that the other source output is available as side information at the receiver.
A turbo encoder and a joint decoder are used to exploit the Markov correlation between the sources.We show that, when the correlation statistics are not initially known at the decoder, they can be estimated jointly within the iterative decoding process without any performance degradation.Simulation results show that the performance of this system achieves signal to noise ratios close to those established by the combination of Shannon and Slepian-Wolf theorems.

Figure 2 :
Figure 2: Diagram showing the achievable region for the coding rates.The displayed point[R 1 = H (S 1 | S 2 ), R 2 = H (S 2 )]shows the asymmetric compression pair selected in our system.

+Figure 3 :
Figure 3: Proposed communication system for the joint source-channel coding scheme with side information.The decoder provides an estimate x k of x k with the help of the side information sequence {y k } M k=1 and the redundant data {r k , z k } M k=1 computed in the turbo encoder.The interleaver τ decorrelates the output of the sources.

Figure 4 :
Figure 4: Branch transition probabilities from the generic state S MS k−1 to S MS k of the trellis describing the HMM multiterminal source.

Figure 5 :
Figure 5: Simplified factor graph defined by the trellis of Figure 4.For simplicity, only M = 3 stages has been drawn.

y 4 Figure 6 :
Figure 6: Assembly of the standard turbo decoder to the factor graph in Figure 5.For simplification purposes, the data length has been fixed to M = 4.

Figure 7 :
Figure 7: BER versus E b /N 0 for entropy values H(S 1 | S 2 ) = 1.0, 0.73, and 0.45 after 35 iterations.The results for known and unknown HMM are depicted with and markers, respectively.The theoretical Shannon limits are represented by the vertical solid lines.The BER range is bounded at 1/M (less than one error in M = 16384 bits).