Nonlinear BCJR equalizer for suppression of intrachannel nonlinearities in 40 Gb/s optical communications systems

: A maximum a posteriori probability (MAP) symbol decoding supplemented with iterative decoding is proposed as an effective mean for suppression of intrachannel nonlinearities. The MAP detector, based on Bahl-Cocke-Jelinek-Raviv algorithm, operates on the channel trellis, a dynamical model of intersymbol interference, and provides soft-decision outputs processed further in an iterative decoder. A dramatic performance improvement is demonstrated. The main reason is that the conventional maximum-likelihood sequence detector based on Viterbi algorithm provides hard-decision outputs only, hence preventing the soft iterative decoding. The proposed scheme operates very well in the presence of strong intrachannel intersymbol interference, when other advanced forward error correction schemes fail, and it is also suitable for 40 Gb/s upgrade over existing 10 Gb/s infrastructure.


Introduction
High-speed optical transmission systems operating at 40 Gb/s or higher are severely limited by intrachannel nonlinearities such as intrachannel four-wave mixing (IFWM) and intrachannel cross-phase modulation (IXPM) [1][2][3][4][5][6].Approaches to deal with intrachannel nonlinearities may be classified into three broad categories: (i) modulation formats [2][3][4][5], (ii) constrained (or line) coding [1], [6], and (iii) equalization techniques [9].The IFWM is a phase-sensitive effect, and the aim of the first approach is to remove the phase short-term coherence of the pulses emitted in a given neighborhood.The role of constrained coding [1] is to avoid those waveforms in the transmitted signal that are most likely to be received incorrectly.This approach has been carefully examined by the authors, and significant performance improvement has been demonstrated for various constrained codes and dispersion maps [1].The most efficient way to deal with intrachannel nonlinearities is to combine the constrained coding and forward error correction (FEC) in a reverse concatenation scheme [1].Although the combined constrained and error correction provides an excellent coding gain it reduces the code rate, because the total code rate is equal to the product of FEC code rate and constrained code rate.Previous work in nonlinear intersymbol interference (ISI) reduction at lower bit rates has involved the use of equalization [7] and nonlinear cancellation [8].A drawback of linear equalizers is that they cannot handle non-linear effects, while the nonlinear cancellation technique in [8] does not take into account the effect of post-cursor ISI.The Volterra series nonlinear equalization technique used to improve performance in a duobinary modulation scheme may suffer from error propagation due to the nonlinear feedback [9].Other techniques proposed recently include maximum likelihood sequence detection (MLSD) based on Viterbi algorithm [9,11], and the turbo-equalization [12].The disadvantage of the Viterbi algorithm is that it does not produce the soft information required for iterative decoding, while the turbo-equalization technique, proposed in [12] in a context of a wireless multipath channel, employs a convolutional code of rate ½, which is unacceptably low for high-speed optical transmission.Furthermore the coding gain provided on an AWGN channel is too small to be of interest in high-speed transmission.Similar turbo-equalization schemes have been extensively studied for a variety of applications such as wireless communications [13] and magnetic recording [14].
In this paper, the turbo equalization scheme [12] is modified for fiber-optics communications.Moreover, this paper is concerned with the suppression of intrachannel nonlinearities, and the results presented demonstrate significant performance improvement.The proposed nonlinear ISI cancellation scheme employs the maximum a posteriori probability (MAP) symbol decoding based on Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [15], while the forward error correction is based on girth-6 [16,17] and girth-8 low-density parity-check (LDPC) codes [18].The nonlinear intersymbol interference (ISI) channel is modeled by a finite state machine (FSM) whose transition and output functions describe the dependency of the channel statistics and the ISI on transmitted patterns.The BCJR algorithm operates on a trellis of the corresponding FSM, and creates the soft information (detected bit likelihoods) used in the iterative decoder.
The main feature of the proposed scheme is that it can operate in the regime of very strong intrachannel nonlinearities where FEC schemes such as turbo or LDPC codes are not designed to operate.Most of dispersion maps used in 10 Gb/s systems will suffer from strong intrachannel nonlinearities if used for 40 Gb/s transmission without any modifications, the proposed scheme is, therefore, an excellent candidate for a 40 Gb/s upgrade over existing 10 Gb/s infrastructure.Moreover, the proposed scheme requires only the modifications on receiver side of a transmission system.
To investigate the bit error rate (BER) degradation due to nonlinear ISI at high bit rates, and quantify gains that can be obtained using the proposed method, we have developed an advanced simulator that takes into account the effects of the optical and electrical components employed in the system.The simulation results show that the MAP symbol decoding used to counter the nonlinear ISI together with LDPC codes offers a significant performance improvement.
The paper is organized as follows.The intrachannel nonlinear effects are briefly introduced in Section 2. The nonlinear BCJR equalization principle, implemented here, is introduced in Section 3, while the numerical results are reported in Section 4. Some important conclusions are given in Section 5.

Intrachannel nonlinear effects
At high bit rates (40 Gb/s and above) the major nonlinear penalties are due to intechannel interactions (IFWM and IXPM) [1][2][3][4][5][6].IXPM is caused by modulation of a pulse phase by nonlinear interaction with neighboring pulses within the channel resulting in timing jitter.In IFWM, at sufficiently high dispersion, energy of the pulses in "resonant positions" is transferred to the middle of a neighboring bit slot causing either a ghost pulse in an empty bit slot or amplitude jitter in a non-empty bit slot (in RZ-OOK) [1][2][3][4][5][6].IFWM is caused by dependence of the refractive index of the medium on the intensity of applied electrical field, and has been identified as the major nonlinear effect that limits transmission distance in pseudo-linear fiber optic communication systems [2].To counter dispersion in optical fibers, dispersion managed schemes are deployed.These set ups consist of alternating spans of fibers having positive and negative chromatic dispersion with the value of residual dispersion being low or zero.Thus, the pulses that travel through these systems undergo alternate widening and compression, causing interaction between the pulses when they overlap.These interactions along with nonlinearities present lead to energy transfers that cause the ghost pulse phenomenon.
It has been observed that in RZ-OOK transmission the most severe problems are caused by pulse triples at positions k, l and m, where k = l + m as illustrated in Fig. 1(a).This represents the resonance condition and creates a ghost pulse at position 0. It was shown in [1,4] that triples that lie close to each other cause the highest energy transfers among pulses.This cumulative effect of this ghost pulse phenomenon is illustrated in Fig. 1(b), which shows the effects of transmitting a sequence "10011111".Triples of pulses at positions (3,-1,2), (4,-1,3), (5,-1,4), (6,-1,5) are all in resonance.Thus, in the resulting energy transfer to pulse position 0, the pulse at position "-1" loses a large amount of energy since it is involved in four triples.Figure 1(b) shows an ideal case in the sense that no other pulses take part in triples, and that effects of other nonlinearities are not shown.
Several methods have recently been introduced for reducing the intra-channel fiber nonlinearities including: novel modulation formats [2,4], constrained coding [1,6], deliberate error insertion [1], and combined constrained coding and error control coding [1].The most efficient way of dealing with intrachannel nonlinearities is to combine the constrained coding with forward error correction in so called reversed concatenation scheme, as explained in [1].This approach provides the excellent coding gains, but reduces the code rate.In Section 3, we provide an efficient way to achieve the large coding gain, comparable to that from [1], without reducing the code rate due to constrained encoder.

Combined nonlinear ISI cancellation and forward error correction
Bahl, Cocke, Jelinek and Raviv proposed a (MAP) decoding algorithm [15] (known as the BCJR algorithm) that can be used for decoding of sequences generated by a finite state machine.It is an optimal decoding method that minimizes the symbol error probability.Applications suggested in [15] include convolutional and linear block codes, and recently it has been shown that BCJR can be used to successfully counter the effects of ISI in magnetic recording channels [14].The output of the channel is described by a trellis, and BCJR operates on this trellis to correct the corrupted data.A significant benefit of using the BCJR algorithm, compared to Viterbi algorithm, is that in addition to detected bits it also provides bit reliabilities, i.e. soft decisions.Iterative decoding and LDPC coding is currently the most advanced forward error correction approach, but its power can be fully exploited only if bit reliabilities are supplied to the decoder.Although an optimal method for minimizing the sequence error probability, the Viterbi algorithm provides only hard decisions, thus preventing soft iterative decoding.We propose to use the BCJR algorithm to suppress the nonlinear ISI due to intrachannel nonlinearities.The BCJR algorithm operates on a trellis that is a discrete dynamical model of the optical channel.Let us suppose that a dispersion map is chosen so that each decoded bit is influenced by m neighboring bits from either side.Let u j be j th bit in a sequence u, and y j be corresponding received sample at the output of the electrical filter.The transition probability p(y j |s) is estimated from simulator by modeling the channel as a finite state machine (s denotes the state of the channel).It is assumed that m previous and m next bits influence the observed bit u j , and the state of the channel s=(u j-m , u j-m+1 ,…, u j , u j+1 ,…, u j+m ) is determined by a sequence of 2m+1 input bits u i ∈{0,1}.The value 2m+1 is referred to as a memory of the discrete channel given by the set of states S. Notice that other approaches based on MLSD or turbo equalization, presented in [7][8][9][10][11][12][13][14], ignore the post-cursor ISI (the bits that follow the bit to be decoded).As an illustration, Fig. 2 shows the conditional probability density function (PDF) of the received sample y given a state s for the following two states s=1110111 and s=0001000 and different number of spans, for dispersion map from Fig. 3. Dispersion map in Fig. 3 is selected in such a way: (i) to keep IXPM low, and (ii) to keep the pulse spread during transmission over a D + fiber in order of tens of bit periods (rather than hundreds of bit periods that is common for pseudolinear transmission).Notice that the memory in both cases is 2m+1=7.The eye diagrams after 10 and 30 spans are shown in Figs.2(b)-2(c).The parameters of D + and D -fibers are given in Table 1.The span length is set to L=120 km, and each span consists of 2L/3 km of D + fiber followed by L/3 km of D -fiber.Pre-compensation of -800 ps/nm and corresponding post-compensation are also applied.RZ modulation format of a duty cycle of 33% is observed, and the launched power is set to 0 dBm.Erbium-doped fiber amplifiers (EDFAs) with noise figure of 5 dB are deployed after every fiber section, the bandwidth of optical filter is set to 3R b and the bandwidth of electrical filter to 0.65R b , with R b being the bit rate (40 Gb/s).As expected, by increasing the number of spans, the ghost pulse at the central bit position grows [see Figs.2(b)-(c)], causing the mean of the PDF to shift to the right [Fig.2(a)].It is obvious that the commonly used AWGN assumption is not valid in this case.The PDF is obtained by passing random sequences through the channel.The length of a sequence is 2 15 , and 32 samples per bit are used in the transmission simulations.To estimate the PDF, the sample range is uniformly quantized in 64 bins, and the number of occurrences of samples in a given bin is counted and normalized with total number of samples.A set of triples (previous state, channel output, next state) uniquely defines a finite state machine on which the BCJR operates.As an illustration a trellis for 2m+1=5 is shown in Fig. 4(a).It has 32 states (s 0 , s 1 ,…, s 31 ), and each state is given by a different 5-bit pattern.The states in vertical columns represent all possible states that the channel (or FSM) can take at a given time instant, while the labeled edges represent possible transitions.Neighboring columns thus represent consecutive time instants.
For example, if the channel is in state s 0 (the bit pattern 00000 was generated), and if the next bit is "0", the FSM stays in state s 0 and generates "0" as an output (the middle bit of the final state).Otherwise, the FSM goes to state s 1 (bit pattern 00001) and outputs again "0".If the FSM is in s 16 (the bit pattern 10000 was generated), and if the next bit is "0", then the FSM goes to s 0 and outputs "0" (the middle bit of the terminal state), otherwise it goes to s 1 (bit pattern 00001) and outputs again "0".No other transition is allowed from s 16 .Similarly, there are two possible transitions from each state in the trellis.A labeled edge is assigned to each allowed transition, and a received sample corresponds to the output symbol of the branch (the central bit of the terminal state).

s s s s s s s s s s s s
(1) The dashed lines in Fig. 4(a) corresponds to transitions (s',s): u j =0, and the solid lines to transitions (s',s): u j =1.The forward metric, and the branch metric γ j (s',s) is given by The max*-operator is defined as [20] ( ) ( ) max* , max( , ) log 1 x y x y x y e The key difference between the regular BCJR algorithm and the BCJR described here is in calculation of LLRs.The conventional BCJR [15], calculates LLRs of the input bits corresponding to the edges, while the modified version calculates the LLRs of the output bits corresponding to the central bit of terminal states [see Fig. 4(a)].Another important difference with respect to turbo equalization proposed in [12] is that the BCJR algorithm operates on a trellis that includes both pre-and post-cursor ISI [see Fig.
max* , , , max* , , , A complete block diagram of the proposed scheme is given in Fig. 5.The BCJR LLRs outputs, L(u j ) (j=1,2,…,n), are fed to an iterative LDPC decoder implemented using messagepassing (MP) algorithm.The main idea is to use BCJR algorithm to partially cancel nonlinear ISI due to intrachannel nonlinearities and reduce BER to around 10 -3 -10 -4 , and then feed bit likelihoods obtained from BCJR algorithm into iterative decoder of an LDPC code.For such input BERs the iterative decoding using LDPC codes alone have been shown to markedly improve performance at 40 Gb/s [16][17][18].

Performance analysis
The LLRs of an uncoded signal are determined by ( ) ( ) where the PDFs of ZERO and ONE bits are obtained by averaging over all states in which the middle bit is involved: The hard-decisions are made according to ( ) Simulation results are shown in Fig. 6 for the following classes of LDPC codes [16][17][18]: The lattice LDPC(8547,6922,0.81)code of girth 8 and column weight 4, lattice LDPC(2512,2043,0.81) of girth 8 an column weight 3, lattice LDPC(1750,1543,0.88)code of girth 6 and column weight 3, and PG(2,2 6 ) based LDPC(4161,3431,0.82) of girth 6.Although much longer, the turbo product code (TPC) of code rate 0.82 based on BCH(128,113)xBCH(256,239) scheme lags far behind different classes of LDPC codes.For example, LDPC(8547,6922,0.81) code alone outperforms TPC by 0.8 dB at BER of 10 -6 , although it is almost 4 times shorter.Viterbi decoder operates on trellis shown in Fig. 4(a) with the truncation length 64, and performs comparable to BCJR algorithm [see Fig. 6(a)].Notice, however, that it does not provide soft decisions required for soft iterative decoding of the outer LDPC code.For the memory 2m+1=7 the lattice girth-8 LDPC code of rate 0.81 and column weight 4, combined with BCJR algorithm, outperforms the TPC by 2.5 dB at BER of 1⋅10 -6 , BCJR algorithm alone by 6.8 dB, and the coding gain over an uncoded system is 9.2 dB.
By iterating (passing the bit LLRs) between the BCJR algorithm block and the LDPC decoder (see Fig. 5), the BER performance can be further improved on the expense of an increased decoding delay.We refer to this as an outer iteration, to differentiate it from iterations within the message passing algorithm, which are referred to as inner iterations.In the first outer iteration, the LLRs from BCJR are passed to and processed by the messagepassing decoder.We say that the i-th outer iteration is complete when the extrinsic LLRs at the output of the MP decoder from the (i-1)-th iteration are processed by BCJR detector and the BCJR extrinsic reliabilities are passed to and processed by the MP decoder.The number of inner iterations in the message-passing decoder is set to 10.The curves with only one outer iteration are obtained for 25 inner iterations of MP decoder.The coding gain after the firth iteration is 9.7 dB at BER of 1⋅10 -6 [the diamond curve in Fig. 6(a)].The improvement in coding gain over TPC is 3 dB (at the same BER), and the improvement over the BCJR detector is 7.3 dB.By extrapolating the LDPC(8547,6922,0.81)curve down to BER of 10 -12 the expected coding gain is around 13.1 dB, and the improvement over TPC is around 3.2 dB.We have recently shown [21][22] (see also Fig. 7) that the finite geometry codes, and lattice codes of high girth and large column weight, do not exhibit an error floor in the region of interest for fibre-optics communications, so that the interpolation is justifiable once the waterfall region is reached.The BER performance can be also improved by increasing the memory of the channel, but it results in an exponential increase of the algorithm complexity.Bit-error rate, BER  From the numerical results presented above, it follows that the combined BCJR intrachannel cancellation and LDPC coding is an excellent candidate to enable transmission in the presence of strong intrachannel nonlinearities.Moreover, it can be used for upgrading the existing 10 Gb/s infrastructure to 40 Gb/s, as explained in Introduction.The BER performance comparison of combined nonlinear ISI cancellation and LDPC coding for different component LDPC codes is given in Fig. 5(b).As expected, the girth-8 codes outperform the girth-6 codes.The coding gains of combined BCJR-LDPC scheme for different channel memories, and LDPC(8547,6922,0.81)code as component code, at BER of 1⋅10 -6 are summarized in Table 2.
The max*-operator in ( 6) involves a two-input max-function and the function for the correction term log(1+e -|x-y| ) that can be implemented as a lookup table.The performance loss by approximating the max*-function by max-function is found to be negligible [see Fig. 5(b)].
In calculation of BER performance (in Fig. 5), an encoded sequence of length 2 15 is transmitted many times over the transmission system for different ASE noise realizations.The number of spans is varied from 20 to 70 and the BERs of uncoded and coded case are recorded.In fiber-optics communications the Q-factor is commonly used as a figure of merit instead of signal-to-noise ratio.However, the Q factor is not an appropriate figure of merit in a highly nonlinear optical channel.The x-axis in Fig. 6 corresponds to the BER of an uncoded signal, when both BCJR block and LDPC decoder are omitted.

Conclusion
We have shown that the MAP detection based on BCJR algorithm supplemented with iterative decoding is able to achieve significant performance improvement in systems heavily degraded by ISI due to intrachannel fiber nonlinearities and dispersion.We note that other techniques for suppression of intrachannel nonlinearities, such the ones based on Volterra series method [9] or MLSD [10,11], do not provide soft outputs required for soft iterative (turbo or LDPC) decoding.The Volterra series nonlinear equalization technique used to improve performance in a duobinary modulation scheme in [9] also may suffer from error propagation due to the feedback.For memories above 2m+1=5, the complexity of BCJR becomes large, and a simplified version of it, namely soft-output Viterbi algorithm (SOVA) [19] is more likely to be of interest for practical implementations.Another important conclusion is that this approach may be combined with an optimal dispersion map design, which will further improve the BER results and reduce the complexity by reducing the number of states in the trellis.A joint design of a dispersion map, channel trellis and a LDPC code is an important problem, and is left for future research.
From implementation complexity point of view it should be noted that the complexity of Chase II algorithm employed in TPC decoder [23] is lower compared to the complexity of MP [24].However, during the decoding process turbo product decoder (for BCH(128,113)xBCH(256,239)) employs 239 Chase II blocks operating in parallel, increasing therefore the decoding delay and the circuit size.Moreover, the turbo product codes require the use of interleaver.For more details about the implementation of MP an interested reader is referred to [24].The length of TPC is 32768, and corresponding trellis for BCJR detection is too complex to be of interest for practical applications.Notice that recent advances in ultrahigh speed microelectronics and electro-optics technology allowed successful demonstration of ETDM-based optical transmission above 80 Gb/s [25], while MLSD is intensively studied for 40 Gb/s transmission [26], suggesting that the nonlinear BCJR equalization scheme proposed in this paper is timely.To reduce the number of states required for BCJR equalization, the dispersion map has to be carefully designed so that during transmission over D + fiber, the pulse is spread up to 10 bits instead of several tens of bits considered here.In order to demonstrate the efficiency of the proposed method, dispersion map is chosen on such a way that strong IFWM occurs.
The results of simulations in this paper are obtained by maintaining the double precision of log-likelihood ratios.In our recent article we have shown [27] that proper choice in number of quantization bits results in negligible BER performance loss.

1 (
y j |u j ) is obtained as explained above, u j and y j correspond to the central bit in state s, and the initial values are set to 4(a)].The forward step [see Fig.4(b)], and backward step [see Fig.4(c)] are the same as that in the original log-domain BCJR algorithm[20].

Fig. 7 .
Fig.7.Semi-analytic method (SAM) for frame-error rate analysis of projective geometry codes for hard-decision decoding using Gallager B algorithm

Table 2
Coding gains of combined BCJR-LDPC scheme for different channel memories at BER of 1⋅10-6