Iterative Receiver Design for Probabilistic Constellation Shaping in ISI Channel

This paper investigates the receiver design for probabilistic constellation shaping signaling over inter-symbol interference channel. The key component performing the constellation shaping is an adjustable distribution matcher, and the probabilistic shaping system is capable to adapt to variable data rates by adjusting the distribution match rate rather than the modulation or coding mode. In this paper, we resort to distinct techniques to derive two iterative receivers operating in time domain. Shaped-BCJR is a trellis-based solution where forward/backward algorithm together with turbo iteration is used to compute the posteriori probability in which the nonuniform a priori symbol probability initializes the calculation. Another receiver is based on linear filtering where expectation propagation provides a Gaussian approximation of the posteriori probability of each symbol. This receiver structure is represented by a factor graph and we detail the derivation of messages exchanging between adjacent nodes. The Bayesian equalization in each EP iteration brings unacceptable computational burden. An efficient block-wise matrix inversion strategy is proposed to tackle this problem, significantly reducing the computational complexity with little performance loss. Simulation results show that the proposed algorithms remarkably outperform LMMSE solutions and traditional EP based algorithms. Proposed matrix inversion strategy can also be used to improve the performance of other filter-type solutions.


I. INTRODUCTION
High-order modulation and flexible channel coding scheme play important roles in optimizing the spectral efficiency (SE) in wired and wireless communication systems. To promote the SE, modulation formats should have a Gaussian-like shape [1]. Huffman code based matcher is proposed in [2] to achieve this goal, however, the codebook must be pregenerated and stored offline. Realtime solution is proposed in [3] where arithmetic coding is used to calculate the codebook online. Recently, Constant Composition Distribution Matcher (CCDM) [4] is proposed to map uniformly distributed data bits to non-uniform amplitudes with a desired distribution. By properly choosing the distribution, shaping gain predicted in theory can be achieved by a practical scheme with a low complexity. Probabilistic Constellation Shaping (PCS) is a multilayer code modulation (CM) scheme implemented by The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang .
combining DM with a systematic binary low-density paritycheck (LDPC) code [5]. This system has shown an advantage of flexibility in terms of transmission rate even though fixed forward error correction (FEC) code, modulation format and bandwidth are used. Latest works on optimizing the distribution and practical implements can be found in [6] and [7].
We show the comparison of SE between non-equiprobable shaping QAM, conventional equiprobable shaping QAM and the Shannon limit in Fig. 1. Both CM schemes use LDPC code as FEC and the operating points target at where the frame error rate (FER) is no more than 10 −2 . The PCS scheme uses only two CM modes, namely, 16-QAM with a rate 3/4 code and 64-QAM with a rate 4/5 code. However, equiprobable constellation scheme uses up to 8 modes, i.e., combining {4, 16, 64}-QAM with rate {3/5, 2/3, 3/4, 4/5, 5/6, 9/10} codes. As shown in the figure, the gap between the operating points of equiprobable scheme and the AWGN channel capacity varies along different SNRs. Two factors contribute to this gap: one is that LDPC code has a finite length (coding gap), another is due to the equiprobable constellation shape (shaping gap) [5]. To provide a finer granularity and narrow the coding and shaping gap, it needs to increase the number of code rates and modulation formats. Rate-compatible codes are suggested in [8] and [9], however, the system complexity also increases to support a large number of modes. In this work, another approach is adopted, and proposed scheme works with a coarse CM granularity where only one code rate is assigned to a modulation format, e.g., 16-QAM only works with a rate 3/4 code. Compared with the non-shaped scheme, it provides a dense ladder-like shaping gain.
Faster-Than-Nyquist (FTN) is another solution to improve the SE which has attracted great attentions recent years [10]- [13]. FTN was introduced by Mazao in 1970s [14], where the symbols are transmitted in a rate faster than the Nyquist which increases the capacity and introduces the intentional inter-symbol interference (ISI) at the same time. In [11], authors compare the frequency-domain equalization (FDE) and time-domain equalization (TDE) for FTN signaling detection. Simulation results show that TDE outperforms FDE in terms of bit error rate, especially for a low packing factor, while FDE is attractive for its lower computational complexity. In [12], the channel estimation and data detection in doubly-selective channel are jointly designed through FD message-passing. The algorithm is then refined by mitigating the impact of colored noise and non-Gaussian symbol to improve the performance. PCS and FTN signal improve the spectral efficiency from different aspects. FTN signal increases the rate by narrowing the symbol interval while PCS signal through high-order modulation. FTN gets higher information rate than Nyquist case mainly due to the benefit of using the excess pulse bandwidth, however, pulse shaping is beyond our scope. Because the average information per symbol of PCS is less the equiprobable scheme, PCS can be viewed as ''Beneath-the-Nyquist'' in a sense.
Communication systems suffer from ISI provoked by the dispersive nature of wide-band channel. For a complex communication system, a reliable receiver is needed to maintain robust transmission in ISI channel. Soft or probabilistic channel equalization is a technique to mitigate the ISI [15]. Inspired by turbo code, iterative processing schemes are extended to joint equalizing and decoding via soft-input soft-output messages where a priori information provided by the channel encoder is used to reduce detection errors. The optimal detector is based on the maximum a posteriori (MAP) criterion which can operate near the channel capacity with properly designed coding and detecting scheme [16]. Assuming a perfect knowledge on channel impulse response (CIR), BCJR algorithm [17] works on a trellis network to compute the posteriori probability. But for a channel with long memory or a large signal alphabet, the operational complexity becomes intractable. This motivates the invention of low-complexity detection algorithms, e.g., linear mini-mum mean square error (LMMSE) detection algorithm, in which the discrete-value symbols are directly mapped to exponential variables [18].
Recent years, expectation propagation (EP) has been proposed as a general framework for approximate variational inference through moment matching [19]. Message passing algorithm is a technique solving the MAP problem in the cascade system. Firstly, the jointly distribution of data bits, symbols, channel taps and noise samples are factorized into local functions. After formulating the system factor graph, the factor nodes exchange messages with their neighbors obeying the framework of EP-based message passing rule. Specifically, the codeword log-likelihood ratio (LLR) are exchanged between the channel decoder and demapper, while the extrinsic messages are exchanged between demapper and channel equalizer [13], [20], [21]. EP-based iterative receivers have shown their advantages over conventional turbo-LMMSE schemes in channel decoding, user detection, channel estimation and interference cancellation in terms of error performance, achievable rate and convergence speed [22]- [25].
The main contributions of this paper are summarized in the following: • A trellis-based approach for PCS system is developed, applying both forward/backward algorithm and turbo equalization to perform detecting and decoding in an iterative way.
• A filter-type approach is proposed. Based on the system factor graph, we show how the EP could operate in the soft decision feedback equalization (DFE) in multipath channel.
• To reduce the matrix inversion cost, a computation reduced scheme capitalizing on block-wise matrix factorization is proposed for Bayesian equalization.
In our work, PCS reaps the large shaping gain and achieves a remarkable degree of flexibility with respect to the transmission rate. Both trellis-based and filter-type algorithms are VOLUME 8, 2020 performed in PCS signaling to attain reliable receivers in ISI channel. To the best of our knowledge, no other implementation of iterative receiver for PCS in ISI channel has been reported in literatures so far. The paper is organized as follows. We first describe the system model to show how the PCS transmitter works in Section II. Inspired by the classical BCJR algorithm, section III is devoted to develop a new turbo receiver where the nonuniform priors are taken into consideration. In Section IV we apply the EP framework to derive the passing messages along the system factor graph. A novel matrix inversion strategy is proposed in Section V to reduce the computational complexity of equalization. Several simulations are included in Section VI to compare the performance between different receivers. Finally, this paper ends with conclusions.
Through the paper, bold lowercase letters are used for vectors, e.g., u is a T ×1 vector where u t is its tth entries where t = 1, . . . , T . Capital bold letters denote matrices, e.g., an M × N matrix H has M rows and N columns. I is the identity matrix and 0 is all zeros matrix. E[·] and V[·] return the expectation and variance value. The probability of x at a is p(x = a) and the probability density function (PDF) is denoted as p(x). CN (µ, σ 2 ) represents the circularlysymmetric complex Gaussian distribution of mean µ and variance σ 2 .

A. DISTRIBUTION MATCHING
The CCDM transforms uniformly distributed data bits u = [u 0 , u 1 , . . . , u T −1 ] to amplitudes a = [a 0 , a 1 , . . . , a N −1 ] with the same empirical distribution The output set is of the equal distance amplitudes A = {1, 3, . . . , 2m − 1} and function n c (·) counts the occurrences of its input in a. The DM rate is defined as R DM = T /N . At the receiver, it also needs to know the distribution to achieve the best performance [26].

B. PROBABILISTIC CONSTELLATION SHAPING
The distribution of constellations approaching the channel capacity is symmetric around zero. Therefore, the sign bits should be equiprobable distributed, and the labeling of the sign bits C 1 is stochastically independent of the following amplitude bits C 2... C m , where From [27], parity bits output LDPC encoder preserves approximately uniform distribution even if the input bits are non-uniformly distributed. For PCS, sign bits can be generated by copying parity bits and appending some data bits if needed. Amplitudes are labelled by the binary reflected gray code (BRGC) ASK in real and imaginary dimension, and the transmitted symbol is formed by mapping two real ASK symbols to one complex QAM symbol. For instance, the PCS-16QAM sign label function β(·) and the amplitude label function β(·) are defined as Their inverse labelling functions are defined as β −1 (·) and β −1 (·), respectively. In Fig. 2 we show the visualization of the probability distribution of PCS-16QAM constellations and the height of each bar indicates the probability of the constellation. The distribution becomes more ''shaped'' if the probability of constellations with lower energy increase and thereby decreasing the whole entropy.

C. BIT-LEVEL INTERLEAVING
Since bit levels have different probabilities, interleaving should be implemented by employing several interleavers that independently scramble bits on row basis while the resulting binary frames are mapped to symbols column-wisely. As shown in Fig. 3, the random interleaving scheme operates on two bit levels where 0 and 1 are of different probabilities in each level. Similar scheme can also be found in bit assignment in block-fading channels [28].

D. MULTIPATH CHANNEL
We consider a single user, single carrier and single-input single-output (SISO) transmission scenario. The multipath channel is modelled as a base-band symbol-space linear filter with L taps: in which the pulse shaping is also accounted for, and where k is the time index and v k is the noiseless channel output. The signal going through the ISI channel is affected by the thermal noise w k ∼ CN (0, σ 2 w ). The transmission can be written in a matrix form: For the sake of simplicity, the receiver has a perfect knowledge on channel state information. We also assume ideal time and frequency synchronization, and the inter-block inference is ignored. The diagram of the transmitter of the PCS system is shown in Fig. 4. The dashed lines are needed if there are not enough parity bits, e.g., when the code rate is less than 1/2 for 16QAM or 2/3 for 64QAM. The function Mod. receives amplitudes and signs to form QAM symbols. A significant feature of this structure is that variable data rates are compatible through adjusting the DM rate while the distribution imposed by DM is still preserved at the output.

III. SHAPED-BCJR RECEIVER DESIGN A. OPTIMAL DETECTION
The optimal receiver computed the estimationû t by minimizing the bit error probability, which is equivalent tô The posteriori probability p(u t = u|y) can be obtained by marg-inalizing over variables in the sequential posteriori probability p(u|y) as Since codewords and binary sources are matched one-by-one, it is very convenient to work with codeword LLR than the bits probability. LLR is defined as Then the decoding rule can be equivalently written as Interleaving enables the independence of code-word bits c k , which yields Then the posteriori probability can be further decomposed into extrinsic LLR and a prior LLR as where extrinsic LLR L ext (c k | y) represents the information on c k from received samples y and c n for all n = k, and a priori LLR L a (c k ) represents the available a priori information on c k .

B. SHAPED-BCJR
Optimal detection has an intractable computational complexity of order O(2 K ). A general approach to reduce that cost is separating the detection problem into two subproblems: equalizing and decoding. Three different structures are shown in Fig. 5, in which the components communicate with their neighbors using either hard estimatesx,ĉ,â andû from their alphabets or corresponding soft estimates s(x), s(c), s(a) and s(û), respectively. When considering the subproblem of equalization, MAP algorithm estimates symbol x k from its alphabets as followŝ This can be efficiently processed using the BCJR algorithm when the ISI channel has a trellis structure with a sufficiently small number of states. Consider the Proakis-B channel [29] which has an impulse response length of L + 1 = 3, and the tapped delay line contains L = 2 memories. Corresponding to the possible content of the channel memory, the channel trellis has totally 4 L states in each dimension of 16QAM alphabets. We denote S = {r 1 , r 2 , . . . , r 4L } the set of all the possible states, and the state of the channel is a random variable from the set s k ∈ S at each time instance. Given current state s k , the next state s k+1 only lies in four possible states depending on the input +3, +1, −1 or −3 fed into the channel. Therefore, the state evolution can be depicted in a trellis form as shown in Fig. 6. Each path across the trellis corresponds to the sequence of input symbols, and any branch of the trellis can be fully characterized by a four-element tuple (i, j, indicates the valid branch. For the trellis defined in Fig. 6, the set of index pair is . . . , (15,15)}, (14) where x i,j denotes the channel input when the state transforms from r i to r j , and v i,j is the corresponding noise-free channel output. This trellis representation will then be used to compute the posteriori probability p(x k |y). Random variable x k is assumed to be independently distributed, so p(x) can be fully factorized into K −1 k=0 p(x k ). Then p(x k |y) can be computed efficiently using the forward/backward algorithm [25] along the paths contained in the branches. The computation is based on the decomposition of the joint probability p(s k , s k+1 , y) as Sequence y can be separated into casual and non-causal samples as p (s k , s k+1 , y) = p (s k , s k+1 , y 0 , . . . , y k−1 , y k , y k+1 , . . . , y K −1 ) . (16) Applying the joint probability decomposition chain rule p(α, β) = p(α) × p(β|α), p(s k , s k+1 , y) can be further decomposed into .
The terms F k (s k ) and B k+1 (s k+1 ) can be computed via the forward/backward recursion: with F 0 (s) = p(s = s 0 ), which corresponds to the shaped distribution. This initial state is different from the classical BCJR algorithm. B K (s) is initialized as 1 for all s ∈ S. The transition probability R k (r i , r j ) can be further decomposed as where the value of p(s k+1 |s k ) is governed by channel input x i,j which is non-uniformly distributed, and p(y k |v k = v i,j ) depends on the corresponding noiseless channel output v i,j . The transition probability R k (r i , r j ) is zero if the its index is not in I, i.e., Since we have y k = v k + n k , from the noise distribution, p(y k |v k = v i,j ) is given by A distinct difference of shaped-BCJR is that the symbol probability is imposed by the DM rather than uniformly distributed. After above preparations, the posteriori probability p(x k = x|y) can be obtained by margining the joint probability p(s k , s k+1 |y) over all the possible branches that correspond to the channel input where x k = x as Then from (15) and (22), we have the conditional LLR of the bit c k as L (c k |y ) = ln Finally, the codeword estimatesĉ k can be recovered from L(c k |y).

C. TURBO EQUALIZATION
In the shaped-BCJR based symbol detection algorithm, a priori knowledge on the probability of each symbol along the trellis is initialized by the DM empirical distribution. The forward/backward evolution does not have extra information to calculate Eq. (15) but solely relies on the observations. Commonly, the performance can be improved if a priori information is available. It is natural to feed the information provided by the encoder as the priori information and the decoder can also uses the posteriori probability provided by the equalizer to improve performance. This is the main motivation of turbo equalization. However, feeding back L(c k |y) contains L a (c k ), which is a direct positive feedback. In order to create feedback that is not too strong and avoid too fast convergency, extrinsic information is usually more practical. Therefore, only extrinsic information is fed back both in equalization and decoding in Fig. 5. Note that equalizing and encoding may introduce extra correlation between adjacent entries, interleaving helps to suppress the correlation between neighbors.
The major drawback of this trellis-based algorithm is that its computational complexity grows exponentially with the number of stored trellis states. It will become intractable for high order modulations or multiple channel taps. So, filtertype approaches are more preferred as high order modulation is concerned in PCS system.

IV. EP-BASED DECISION FEEDBACK RECEIVER DESIGN
This section focuses on the design of a filter-type receiver that approximates the posteriori probabilities using EP-based message passing algorithm along the system factor graph.

A. EXPECTATION PROPAGATION FRAMEWORK
EP can be viewed as an extension of loopy belief propagation where variable nodes (VNs) are assumed to be lie in the exponential distribution families [31]. The messages exchanged between VNs and factor nodes (FNs) can be characterized by brief distributions. This enables approximating the posteriori PDF p(c|y) in a fully-factorized iterative way. The basic rule of updating the massages between FN F and VN v at its i th entry is as follows where Proj Q v i (·) is the well-known Kullback-Leibler projection to the target probability distribution Q v i . The belief q F (v i ) is an approximation of the marginal of the true posteriori p(v i ), which can be obtained by combining the factors on FN F with messages from neighbor VNs where v\v i denotes the set of VNs without v i . This projection to exponential families is the moment matching, which significantly simplifies the messages calculation [19].
Symbol VNs are assumed to be multivariate circularly symmetric Gaussian distributed with diagonal covariance matrices. Interleaving reduces the correlation between nearby symbols and allows the neglection of non-diagonal entries in the covariance matrix. Therefore, the approximated posteriori distribution can be factorized into independent Gaussians, i.e., the message on x i can be fully defined by a mean and a variance. Codeword VNs are assumed to be Bernoulli distributed, their messages can be represented by bit-level LLR.

B. FACTOR GRAPH MODEL
The optimal receiver satisfies the MAP criterion by maximizing the posteriori probability, where the posteriori PDF can be factorized as in which p(x|c) = K −1 k=0 p(x k |c k ) is the memoryless mapping and p(c|u) is the overall coding scheme, i.e., from binary sources to codewords. Channel factor in Eq. (26) represents the relationship between the received samples and the transmitted symbols. This posteriori PDF results in the factor graph shown in Fig. 7, and posteriori probabilities will be estimated through messages passing algorithm iteratively.

C. BAYESIAN EQUALIZATION
From Bayesian's perspective, the channel inputs are modelled as random variables with Gaussian priori PDF CN (x d , v d ), and the inputs x and outputs y are assumed to be jointly Gaussian. Hence, the Bayesian estimation of posteriori PDF CN ( µ e , e ) has an explicit expression [32]: Matrix operations in Eq. (27) need a lot of computations, especially the computational complexity of matrix inversion VOLUME 8, 2020 is usually unaffordable. For a more general implementation, windowed process is adopted by applying a sliding window [k − p, k + d], where k is the window index, p and d are the number of pre-cursor and post-cursor samples, respectively. Then Eq. (27) can be rewritten as By selecting p and d larger than the finite length of channel taps, Eq. (28) transforms the matrix inversion into equivalent element-wise operations. Finally, we have

D. MESSAGE EXCHANGING RULE
In this section, we detail the messages exchanged along the paths in the considered factor graph. The messages arriving at the VN x k are independent Gaussians, we have where L a (·) and L e (·) denote a priori and extrinsic LLR, respectively. In the factor graph of Fig. 7, each VN only connects with a pair of distinct FNs and the message output VN is only characterized by its input message, e.g., m DEM →c (c k,j ) = m c→DEC (c k,j ).

1) MESSAGES FROM DEC TO DEM
DEC node generates a priori information L a (c) and feed them to DEM nodes. The DEM node uses the priori LLR to compute the priori probability on x k = α as where ϕ −1 (·) denotes the constellation mapping. Above is the Probability Mass Function (PMF) corresponding to the message which will be used to compute q DEM (x k ).

2) MESSAGES FROM EQU TO DEM
Based on the approximate posteriori probability of x k at EQU, the message output the equalizer can be derived using the message passing rule of Eq. (24) as Accordingly, the message can be extracted via the Gaussian density division Gaussian division may lead to a negative variance. This unstable value can be directly replaced by its modulus or parameterized by a damping or mixing factor [33].

3) MESSAGES FROM DEM TO EQU
The demapper calculates its approximate posteriori on x k by combining messages from connected VNs which is also the posteriori PMF on the elements x k over the constellation set X . Using Eq. (30) and (32), it can be denoted as We make a small modification on computing D(α) by replacing simply product (SP) with geometric mean (GM) over its inputs We term it DFE-GM and Eq. (37) DFE-SP. A major difference is that geometric mean ''normalizes'' the inputs, i.e., no range dominates the weighting and a given percentage change in each of its inputs has equal effect on the geometric mean product. The simulation results in Sec. VI show its advantages. Through moment matching, it is projected as Based on the approximate posteriori on x k from Bayesian equalization, the message to the equalizer m x→EQU (x k ) is calculated according to Eq. (24) as This Gaussian density division corresponds to the extrinsic information to the Bayesian equalizer which ''partially'' removes the message m x→DEM (x k ). However, it is completely removed in BP.

4) MESSAGES FROM DEM TO DEC
Message from DEM to DEC is calculated using the approximate posteriori on the VN d k,j as bit LLR    This message is obtained by marginalizing on k th symbol's j th bit, and X p j is the set of symbols whose j th bit is p. Bit-level LLR calculation is followed by a bit-level assignment before fed to the FEC decoder.
The receiver structure is shown in Fig. 8, interleaving and bit assignment are omitted for the representation simplicity. Unlike turbo iteration where mild extrinsic information is always feedback, this receiver feeds approximate a priori information to the equalizer to calculate its posteriori probabilities because most of the symbol posteriori information has already been removed in Eq. (40).

V. BLOCK-WISE MATRIX INVERSION USING CHOLESKY FACTORIZATION
Bayesian equalization requires excessive computational costs, and symbol-wise matrix inversion contributes to the major part of the computation. Therefore, an efficient calculation method is in need. In this section, we consider a blocktype matrix inversion strategy. In detail, we define which is a banded matrix with equal lower and upper bandwidth. As shown in Fig. 9, we define its submatrix m k which is formed by taking serial entries, namely (kl+1) th ∼ (kl+l + p + d) th row and column, from the original matrix. We still use p and d denoting the number of pre-cursor and postcursor samples. The block size is (p + l + d) where l is a positive variable that affects the algorithm granularity and computational burden. Then the Bayesian equalization can be written as where x d k , V d k , µ e k and γ e k are the corresponding k th submatrices of x d , V d , µ e and γ e , respectively, and Clearly, Eq. (27) and (28) are two special cases of Eq. (44).
When l reduces to one, this scheme degrades to typical element-wise filtering, and when l is set to K , the algorithm is equivalent to Eq. (28). From the definition, we can see that m k is symmetric and positively defined and it can be factored as where R is an upper triangular with positive diagonal elements. The matrix inversion can be solved by Cholesky decomposition in a recursive way. Then we give a brief complexity analysis of proposed algorithm, linear MMSE and FDE [12]. The Cholesky factorization has a computational complexity of W 3 /3+2W 2 where W is the matrix order. Denoting l, L and N as the block size, channel length and total symbols, totally (2L + 4(l + p + d + 1))×N + (l + p + d) 2 × N /3 operations are required in proposed algorithm during the block-wise equalization. The FDE involves the calculation of N FFT -point FFT/IFFT which has a complexity of O(N FFT ×log(N FFT )). Taking the prior knowledge in calculation, the total operations required in FDE is 4N ×log(N FFT ) +15N . For LMMSE equalization, the complexity is (N × l 2 )/3+ (L + 1)×N . In Fig. 10 we show the computational complexity comparation of different block sizes where up to N = 2 10 symbols are considered. LMMSE has the lowest computational complexity when the block size is small. Once initialized, the LMMSE detector requires no update during the computation, so the initialization becomes the major computation with the growing of l. Benefited from the rapid FFT computation, single-tap FDE saves a lot of computations, especially when a large block size is adopted. As depicted in the figure, proposed block-wise matrix inversion strategy saves up to 70% computational burden than the symbol-wise implement (l = 1), and the most computational efficient block size varies with different overlapping samples.

VI. PERFORMANCE EVALUATION
In this section, we show the performance of DFE-SP, DFE-GM, nuBEP, B-EP [34] and LMMSE for different scenarios. Shaped-BCJR and AWGN are also presented in Proakis-B channel as lower bound. From [35], MMSE families show similar performance as LMMSE, so other MMSE approaches are not included in simulation. We also do not include EP-F because it exhibits almost identical performance with nuBEP for each iteration when 16QAM is used [36]. A detailed comparison of BCJR families is reviewed in [30] and the shaped-BCJR runs in Proakis-B channel with a tolerable computational complexity, but it is not included in Proakis-C channel simulations.
We average the simulation over 10 4 random frames with PCS-16QAM. DVB-S2 LDPC code of rate 3/4 is used as FEC code and belief propagation performs decoding with a maximum of 10 iterations. The pre-and post-cursor samples are both set to 20, and the block size is set to 10 BER performance of shaped-BCJR, DFE-GM and FDE in Proakis-B channel are given in Fig. 11 where the DM rate is 0.8. FDE in 64-order exponential-decay channel is also presented as reference while the performance of shaped-BCJR is not included due to the unaffordable quantities of trellis states. It can be observed that DFE-GM has a performance gap of almost 6 dB with respect with the shaped-BCJR at BER=10 −4 , and FDE falls behind the DFE-GM about 8 dB. Simulation results show that TDEs have substantially superior performance to the FDE in Proakis-B channel which contains a strong dominant component.
In Fig. 12 we depict the BER performance versus SNR with two channel configurations where the DM rate is 0.5. At first iteration EP based approaches have similar performance which are about 2 dB better than the LMMSE. After 5 iterations, BEP keeps about 2 dB better than LMMSE. Proposed algorithm with SP has an improvement of 5 dB with respect to the LMMSE and 1 dB compared to the nuBEP algorithm. When the GM is applied, another 0.5 dB further gain can be obtained in both channel configurations, and the gap with shaped-BCJR narrows to about 2 dB in Proakis-B channel. In Fig. 13, we compare the BER performance of DFE-GM with different DM rates and LDPC iterations for PCS-16QAM in Proakis-B channel. It can be observed that higher SNR enables a larger DM rate for reliable transmission, on the other hand, transceiver could adjust the DM rate to maximum the system throughput or reliability. The performance can also be improved by increasing the number of LDPC iterations, but the effect is limited. It improves only 0.1dB by increasing the number of LDPC iterations from 10 to 50, which means five times the decoding computational burden.

VII. CONCLUSION
In this paper, we show how PCS with LDPC as FEC could work in ISI channel. PCS is a promising technique to optimize the SE, and enables the system compatible with a wide variety of source rates. In order to obtain a reliable receiver, we design two receivers based on different detection algorithms: MAP detection through shaped-BCJR algorithm and filter-type iterative detection using EP approximation. The major advantage of EP-based algorithms lies in the fact that they perform better than the LMMSE-based algorithms, at the same time, computational complexity does not grow exponentially with the channel states as opposed to most MAP-based algorithms.
First, the forward/backward algorithm together with turbo equalization is used to derive the posteriori probability. The algorithm is adapted by exploiting the shaped probability of each symbol, resulting in the shaped-BCJR algorithm. Second, we detail the messages exchanged along connected nodes in the factor graph. Proposed solution iteratively finds the best Gaussian distribution that approximates the true posteriori of target symbols. This solution presents quite an improved performance compared to previous approaches after several iterations. Finally, we present a novel matrix inversion strategy to cut down on computational complexity with the help of block-wise decomposition.
From numerical simulations, we demonstrate that the proposed techniques achieve better performance than LMMSE or conventional EP algorithms. Proposed matrix inversion scheme saves more than half of the computational burden than the element-wise equalization. In this paper, we deal with the detection in SISO channel, applications in MIMO channel and multi-user detection remain to be explored. YANG HUANG was born in 1997. He received the B.S. degree in information engineering from Southeast University, Nanjing, China, where he is currently pursuing the M.S. degree with the National Mobile Communications Research Laboratory. His research interests include cloud computing and big data, machine learning, and wireless signal processing. VOLUME 8, 2020