Performance Prediction of Nonbinary Forward Error Correction in Optical Transmission Experiments

In this paper, we compare different metrics to predict the error rate of optical systems based on nonbinary forward error correction (FEC). It is shown that the correct metric to predict the performance of coded modulation based on nonbinary FEC is the mutual information. The accuracy of the prediction is verified in a detailed example with multiple constellation formats, FEC overheads in both simulations and optical transmission experiments over a recirculating loop. It is shown that the employed FEC codes must be universal if performance prediction based on thresholds is used. A tutorial introduction into the computation of the threshold from optical transmission measurements is also given.


I. INTRODUCTION AND MOTIVATION
M ANY optical transmission experiments do not include forward error correction (FEC).The reasons for this are that often, FEC development is still ongoing, or FEC developers are physically remote from the experiment.Often, researchers would also like to reuse experimental data obtained in expensive optical transmission experiments to evaluate the performance of different FEC schemes, without needing to redo the transmission experiment and/or signal processing.Therefore, thresholds are commonly used to decide whether the bit error rate (BER) after FEC decoding is below the required target BER, which can be in the range of 10 −13 to 10 −15 .The most commonly used threshold in the optical communications literature is the pre-FEC BER.
The use of thresholds is also very convenient in practice because very low post-FEC BER values are hard estimate.The conventional design strategy has therefore been to experimentally demonstrate (or simulate) systems without FEC encoding and decoding, and to optimize the design for a much higher BER value, the so-called "FEC limit" or "FEC threshold".This approach relies on the strong assumption that a certain BER without coding can be reduced to the desired post-FEC BER by previously verified FEC implementations, regardless of the system under consideration.
Parts of this paper have been presented at the 2016 Optical Fiber Communication Conference (OFC), Anaheim, CA, USA, Mar.2016 in paper M2A.2 [1].
A. Alvarado is with the Optical Networks Group, Dept. of Electronic & Electrical Engineering, University College London (UCL), London, WC1E 7JE, UK.R. Rios-Müller is with Nokia Bell Labs, Villarceaux, France.L. Schmalen was financially supported by the Federal German Ministry of Education and Research (BMBF) in the scope of the CELTIC+ project SASER-SaveNet. A. Alvarado was supported by the Engineering and Physical Sciences Research Council (EPSRC) project UNLOC (EP/J017582/1), UK.
Using pre-FEC BER thresholds is very popular in the literature and has been used for example in the record experiments based on 2048 quadrature amplitude modulation (QAM) for single-core [2] and multi-core [3] fibers.This threshold indeed gives accurate post-FEC BER predictions if three conditions are satisfied.First, bit-level interleaving must be used to guarantee independent bit errors.Second, the FEC under consideration must be binary and universal, and lastly, the decoder is based on hard decisions (bits) rather than soft decisions.Recently, however, it was shown in [4], [5] that the pre-FEC BER fails at predicting the post-FEC BER of binary soft-decision FEC.This was shown for both turbo codes and low-density parity-check (LDPC) codes, in the linear and nonlinear regimes, and in both simulations and optical experiments.Furthermore, [4] also showed that a better predictor in this case is the generalized mutual information (GMI) 1  [6,Sec. 3], [7,Sec. 4.3], [8], [9] and suggested to replace the pre-FEC BER threshold by a "GMI threshold".
The rationale for using the GMI as a metric to characterize the performance of binary soft-decision FEC is that the GMI is an achievable information rate (AIR) for bit-interleaved coded modulation (BICM) [6], [7], often employed as a pragmatic approach to coded modulation (CM).For square QAM constellations, BICM operates close to capacity with moderate effort, and thus, it is an attractive CM alternative.However, for most nonsquare QAM constellations, BICM results in unavoidable performance penalties.For these modulation formats, other CM schemes such as nonbinary (NB) FEC [10] and multi-level coding with multi-stage decoding [11] can be advantageous.Furthermore, BICM is expected to be not the most complexity-efficient coded modulation scheme for short reach and metro optical communications with higher order modulation.The reason is that the digital signal processing (DSP) implementation needs to work at the transmission baud rate, but the FEC decoder needs to operate at m times the DSP rate, if 2 m -ary higher order modulation formats are used.For these applications, multi-level coding [12] or NB-FEC may be good candidates and for these, the throughput is in the same order as for the DSP.Although most nonbinary FEC schemes are considerably more complex to implement than their binary counterparts, recent advances [13], [14] show that very low-complexity nonbinary FEC schemes for higher order constellations can be implemented using for instance the numerically stable algorithm presented in [15].
In this paper, we investigate the performance prediction of NB soft-decision FEC (NB-FEC) and show that the correct threshold in this case is the mutual information (MI) [16]. 1 Also known as the BICM capacity or parallel decoding capacity.The MI was previously introduced in [17] to assess the performance of differentially encoded quaternary phase shift keying and was shown to be a better performance indicator than the pre-FEC BER.The use of MI as a post-FEC BER predictor for capacity-approaching nonbinary FEC was also conjectured in [4, Sec.V] and was previously suggested in [18], [19] in the context of wireless communications.
The main contribution of this paper is to show that the MI is the correct threshold for a CM scheme based on NB LDPC codes.This is verified in both an additive white Gaussian noise (AWGN) simulation and in two optical experiments using 8-QAM constellations.We show that the MI allows us to accurately predict the post-FEC performance of NB LDPC schemes and also show that other commonly used thresholds (such as pre-FEC BER, pre-FEC symbol error rate (SER) and bit-wise GMI) fail in this scenario.
This paper is organized as follows.In Sec.II we describe the system model we use and lay down some information theory preliminaries.Afterwards, in Sec.III we show what thresholds we should use to predict the performance of NB FEC schemes.In Sec.IV, we verify our predictors with a simulation example, a back-to-back experiment and a transmission experiment over a recirculating loop.Finally, in V, we discuss code universality and give guidelines for using the proposed thresholds.

A. System Model
Fig. 1 shows the NB-CM scheme under consideration.The data bits are mapped to NB symbols from GF(2 m ) using a one-to-one (i.e., invertible) mapping function, then encoded by an NB-FEC with rate R, and then mapped to D-dimensional constellation symbols from the set S := {s 1 , . . ., s M }, where |S| = 2 m = M and s i ∈ R D .Frequently, D = 2 (with complex symbols), but in optical communications, also D = 4 [8], [9], [20], [21] and D = 8 [22], [23] are used.As will become obvious later, the mapping to symbols is shown in two stages in Fig. 1, namely first mapping the NB symbols U ∈ {1, 2, . . ., M } to bit patterns B of m bits, and mapping these to constellations symbols X.In some cases, we require the combination of bit mapper and mapper Φ, which we denote by φ(i) = s i and which maps an integer i to a modulation symbol s i .
The constellation symbol s i ∈ S is transmitted with a priori probability P (X = s i ) := λ i through an "optical channel"2 .Most communication systems transmit equiprobable symbols, i.e., λ i = 1/M , ∀i.However, in the case of probabilistic shaping [24]- [26], the probabilities of occurrence of the symbols may differ.The optical channel3 takes a sequence of ) and maps them to a waveform w(t) by means of a pulse shaping function ρ(t) with with T s being the symbol period and κ the discrete-time index.The optical channel further includes digital-to-analog converters (DACs), filtering, transmission including amplification, analog-to-digital converters (ADCs), and DSP to remove effects of chromatic dispersion, polarization mode dispersion, polarization rotation, phase noise, frequency offset, etc.It further includes matched filtering, equalization and interleaving.
At the receiver, for each sampled symbol y[κ], the soft symbol demodulator (see Fig. 1) computes M likelihoods q Y |X (y|s i )λ i , where q Y |X (y|s i ) is a function that depends on the received D-dimensional sampled symbol y and the constellation symbol s i ∈ S.These are passed to an NB-FEC decoder.Note that usually, for numerical reasons, a vector of M − 1 nonbinary log-likelihood ratios (LLRs) is computed for each D-dimensional received symbol y instead.These (nonbinary) LLRs are given by Ideally, the receiver knows the (averaged) optical channel transition probability density function (PDF) p Y |X (y|s i ), applies sufficiently long interleaving, and sets q Y |X (y|x) = p Y |X (y|x) in (1).Usually, however, the exact channel transition PDF is not known at the receiver, or the computation of the LLRs is too involved using the true PDF, which is why often approximations are used.In this case q Y |X (y|x) = p Y |X (y|x), and thus, we say that the receiver is mismatched [28].Often, for instance, the (multivariate) Gaussian PDF is assumed at the receiver, i.e., q Y |X (y|s i ) = q awgn (y|s i ), where In [21], different approximations are compared for D = 4 and it was found that the circularly symmetric Gaussian approximation with diagonal covariance matrix Σ reliably approximates the true PDF unless the input power is increased to very high levels.Besides, the Gaussian PDF has also been shown to be a good approximation for the true PDF in case of uncompensated fiber links with coherent reception [29].Furthermore, using a the Gaussian PDF also simplifies the numerical computation of the LLRs.A predominant case is D = 2 (e.g., QAM constellations detected independently in each polarization) with circularly symmetric noise (diagonal Σ) and variance σ 2 n per dimension.This is the case on which we focus on this paper and which is also dominant in coherent long-haul dispersion uncompensated links [21].In this case Assuming equally likely symbols (λ i = 1/M ), the LLRs in (1) are given by After LLR computation, the NB soft-decision FEC decoder (e.g., a nonbinary LDPC decoder) takes these LLRs and estimates the transmitted NB symbols, which are later converted into decoded bits.Here we only assume that the nonbinary FEC is matched to the constellation, i.e., each nonbinary symbol of the FEC code can be mapped to m = log 2 (M ) bits.This allows us to consider nonbinary LDPC codes defined over either the Galois field GF(2 m ) or the ring Z M of integers modulo M .We further assume that soft decision decoding is carried out, see, e.g., [15].For other, low complexity versions of that algorithm, we refer the interested reader to the references in [15].

B. FEC Universality
When assessing and comparing the performance of different modulation formats and different transmission scenarios (e.g., fiber types, modulators, converters, etc.) based on thresholds, it is important to understand the concept of FEC universality.A pair of FEC code and its decoder are said to be universal if the performance of the code (measured in terms of post-FEC BER or SER) does not depend on the nonbinary channel (with input U and output Z when referring to Fig. 1), provided that the channel has a fixed mutual information I(U ; Z).
Unfortunately, not much is known about the universality of practical coding schemes.It is conjectured that practical (binary) LDPC codes are universal [30] which has been shown to be asymptotically true under some relatively mild conditions [31].The class of spatially coupled LDPC codes, recently investigated for optical communications [32] has been shown to be asymptotically universal [33].An example of a nonuniversal coding scheme are the recently proposed, capacityachieving Polar codes [34], which need to be redesigned for every different channel.Most of these results are for binary codes and even less is known for nonbinary codes.
Although most practical LDPC codes are asymptotically universal, we wish to emphasize a word of caution: practical, finite-length realizations of codes may only be approximately universal.For instance, [30,Fig. 3] reveals that for some practical LDPC codes, the performance at a BER of 10 −4 significantly differs for different channels.This difference is expected to be even larger at very low BERs due to the different slopes of the curves.We will address this difference in detail in Sec.V.

C. Channel Capacity and Mutual Information
Consider an information stable, discrete-time channel with memory [35]- [37], which is characterized by the sequence of PDFs , for N = 1, 2, . ... The maximum rate at which reliable transmission over such a channel is possible is defined by the channel capacity [35]- [37] C := lim where the maximization is over p ) under a given input constraint (e.g., power constraint).For a fixed p X N 1 (•), the mutual information (MI) between the input sequence X N 1 and the output sequence Y N 1 is given by The capacity C in ( 4) is the maximum information rate that can be achieved for any transmission system, requiring carefully optimized, infinitely long input sequences.Usually, in most of today's systems, the channel input sequence is heavily constrained (e.g., by the use of QAM constellations) to simplify the transceiver design.Furthermore, often symbol sequences with independent and identically distributed (IID) elements are used such that we have IID symbol sequences are obtained if a memoryless mapper is used (as we do in this paper, see, e.g., Φ in Fig. 1) and if sufficiently long interleaving is applied after FEC encoding.Under these conditions, an achievable information rate (AIR) is given by which is a lower bound to the capacity C due to the constraints imposed on the transmitted sequences.In the remainder of this paper, we limit ourselves to IID channel input sequences generated via (5).
The numerical evaluation of the MI in ( 6) is in general very difficult, even for for relatively short input and channel output sequences (small values of N ).The reasons are as follows: First, numerically evaluating I(X N 1 ; Y N 1 ) is hard, even for very small memory lengths N .Second, most of today's transceivers do not exploit memory but instead use long interleavers to remove all effects of memory to keep decoding simple with symbol-by-symbol detection.Hence, it would not be fair to provide thresholds based on memory, which give a performance that could be achieved at some point in the future, provided that all memory is adequately exploited at the transceiver.Instead, we neglect all memory effects and obtain thresholds that indicate a performance achievable with today's systems.
Therefore in this paper, we focus on symbol-by-symbol detection (see Fig. 1).Under these constraints, we can further lower bound the MI in (6) (see [38, Sec.III-F] for an indepth proof) by employing a memoryless channel transition PDF p Y |X (•|•) that is obtained by averaging the true channel PDF.This approach gives or equivalently Note that I(X; Y ) is an AIR for systems employing optimum decoding, i.e., when the LLR computation uses q Y |X (y|x) = p Y |X (y|x), and if sufficiently long symbol-wise interleaving is applied (within the equivalent "optical channel") and sufficiently long capacity-achieving FEC codes are used.

III. THRESHOLDS FOR NONBINARY FEC
Based on the discussion in Sec.II-B, here we propose to use the MI as performance thresholds for NB-FEC.After a discussion on how to compute these thresholds, we describe some other commonly used thresholds.

A. Thresholds Based on Mutual Information
In order to estimate the performance of NB-FEC, motivated by the universality argument in Sec.II-B, we would like to use the MI I(U ; Z) as performance threshold.I(U ; Z) is the MI between the FEC encoder output U and FEC decoder input Z (see Fig. 1) and characterizes the nonbinary channel.Unfortunately, the MI I(U ; Z) is not easy to compute immediately, which is why we define a threshold that is directly related to the input X and output Y of the optical transmission experiment, to which we usually have access.This also allows us to avoid including soft symbol demodulation in the transmission experiment.
In the previous section, we have seen that I mem is a maximum AIR if all memory effects are taken into account and is an upper bound on I(X; Y ) , which is an AIR under optimum decoding with an averaged channel PDF.As a consequence of the data processing inequality, we have where we have equality in (a) only in some special cases described below.Due to this inequality, we cannot always directly use I(X; Y ) as a proxy for estimating I(U ; Z).We resort to the theory of mismatched decoding [39] [28] and propose to use I(X; Y ) as estimate of I(U ; Z), where We have I(X; Y ) ≥ I(X; Y ) ≥ I(U ; Z), however, we found in numerical simulations and in transmission experiments that, in the context of optical communications, I(X; Y ) ≈ I(U ; Z).Hence, we can use I(X; Y ) as an accurate estimate of I(U ; Z) and of the NB-FEC performance.
In general, ( 9) is not easy to evaluate, as the expectation is taken over P Y,X (y, x) = p Y |X (y|x)λ φ −1 (x) , which is often not known.However, we can replace the expectation in ( 9) by the empirical average, as done for instance in [24, Sec.III].We denote this empirical approximation of I(X; Y ) by I NB , which can be computed from an optical transmission experiment with a measurement database of N m measured values x[κ] ∈ S and their corresponding received y[κ] by where q Y |X (y|x) is the same PDF used for computing the LLRs in (1), e.g., the D = 2-dimensional Gaussian PDF.The variance of this distribution can for instance be estimated from the measurement database (or a subset thereof), see, e.g., [24, Sec.III].Later, in Example 2, we show how we can jointly estimate the MI and the noise variance, avoiding an extra variance estimator.As the optimization in ( 9) and ( 10) is over a strictly unimodal (∩-convex) function in ν [7, Thm.4.22], the maximization can be efficiently carried out using, e.g., the Golden section search [40].
B. Detailed Description of the Proposed Threshold I(X; Y ) In the following, we describe in detail the steps that lead us to the performance metric in (10) starting from I(U ; Z).The remainder of this section may be skipped in a first reading.The input Z to the FEC decoder consists of vectors of M − 1 dimensional LLRs, whose distributions are hard to estimate, especially if M becomes large.Therefore, we would like to relate I(U ; Z) to X and Y , to which we have immediately access as input and output parameters of the optical transmission experiment.Using the data processing inequality [41], we can bound I(U ; Z) as follows where we have equality in (a), if the mapper Φ is a one-to-one function (this is not the case for many-to-one mappings, used in, e.g., some probabilistic shaping implementations [42]).In this paper, we only consider one-to-one mapping functions and thus have I(U ; Z) = I(X; Z).We have equality in (b) if and only if Z constitutes a sufficient statistic for X given Y [43], i.e., if X is independent of Y given Z.
While equality in (a) is obtained in most communication systems, we do not necessarily have equality in (b), especially if we employ a mismatched decoder, i.e., when the PDF q Y |X (y|x) assumed in the decoder does not exactly correspond to the average channel PDF p Y |X (y|x).Therefore, we cannot directly use I(X; Y ) but need to find a more accurate estimate of I(U ; Z) based on X and Y .
Unfortunately, in general, p Y |X is not known and must be estimated from the experiment.As the noise in uncompensated coherent optical fiber communication tends to be Gaussian [29], a good choice is to approximate p Y |X (y|x) by a Gaussian PDF, with different levels of refinement [21].In most cases, circularly symmetric Gaussian PDFs are enough, which is what we have used in (2).To get a more accurate estimate of the conditional channel PDF, we can also use a kernel density estimator (KDE) [44] to approximate the PDF.
As estimating the PDF p Y |X (y|x) is not always easy and because we may use a mismatched decoder with I(U ; Z) ≤ I(X; Y ), we propose to use a different way to estimate a lower bound of I(X; Y ) that well approximates I(U ; Z).We start with the auxiliary channel lower bound [45], which is frequently used in optical communications to estimate the MI [21] [24, Sec.III] [46, Sec.2] [47] and which is given by ≤ I(X; Y ).
The expectation in (11) is taken over the actual (averaged It is often claimed in the above-mentioned references that one should use the same q Y |X (y|x) as we use in the decoder (e.g., to compute the LLRs in (1)) to estimate the MI via (11).However, we found in numerical experiments that I(X; Y ) significantly underestimates I(U ; Z) in many practical applications.We illustrate this discrepancy by means of an example.  y|s1) .We can thus write, for i ∈ {1, . . .M }, which allows us to write and hence we have I(X; Y ) = I(X; Z).However, if we evaluate I(X; Y ) from ( 11) for K = σ 2 n , we inevitably have I(X; Y ) < I(X; Y ).If we employ for example LDPC codes with the widely used min-sum decoder, it is well-known that the decoding performance does not depend on K > 0 used for computing the LLRs and hence, I(X; Y ) will not be an adequate performance estimate and may even largely underestimate the performance, if used as threshold.
We therefore propose to use a generalization of (11), which originates from [39] and which we found to accurately predict I(U ; Z) and hence the NB-FEC performance.The proposed estimate is denoted by I(X; Y ) and is given by (9) with where the first inequality is obvious as I(X; Y ) is recovered for ν = 1 in (9) and the second inequality is shown in [39].We found that I(X; Y ) gives an accurate estimate of I(U ; Z) and thus the NB-FEC performance.
A convenient byproduct of using I(X; Y ) is the fact that it can be used to jointly estimate the MI I(U ; Z) and the variance of the noise.We illustrate this application in the following example.
Example 2: For the case of uncompensated links, we know that the Gaussian PDF is a good approximation of the channel PDF [29].However, in general, as we do not know a priori the variance of the noise PDF, we need to estimate it.In [24, Sec.III], it is for instance proposed to estimate the noise variance from the measurement database.Here we propose to directly use the MI estimate to obtain the noise variance.As the variance is unknown, we first fix σ 2 n = 1 2 in (2) and then evaluate (10) as After carrying out the optimization over ν (using, e.g., the Golden section search), we immediately get an estimate of the noise variance as σ2 n = 1 2ν , where ν is the ν that maximizes (12).

C. Other Thresholds
In the remainder of this paper, the accuracy of the MI as a decoding threshold will be compared against predictions based on other performance thresholds.We assume from now on for simplicity that all constellation symbols are equiprobable, i.e., λ i = 1 M .First, we use the bit-wise GMI as metric [4] GMI where c i [κ] is the bit at bit position i mapped to symbol x[κ] and Li (y[κ]) are the bit-wise LLRs computed according to s∈S1,i p Y |X (y|s) where S b,i is the set of constellation symbols where the i-bit of the binary label takes on the value b.Second, we use the pre-FEC BER 1 m m i=1 P ( Bi = B i ), and the pre-FEC SER P ( X = X).These quantities are schematically shown at the bottom of Fig. 1.We immediately see that only the MI is directly connected to the NB-FEC decoder, and thus is the most natural threshold choice.In particular, the transmitter in Fig. 1 uses a GF(2 m )-to-bit mapper followed by a bit-tosymbol mapper Φ(b) = x, which maps the vector of bits b = (b 1 , b 2 , . . ., b m ) to a constellation symbol x ∈ S.These blocks are included only so that the GMI and pre-FEC BER can be defined (and calculated) but have no operational significance for the NB-CM system under consideration, as U can be directly mapped to X.The bit labeling used in the mapper Φ affects both the GMI and pre-FEC BER, but has no impact on the actual performance of the system.At the receiver side, additionally logarithmic likelihood ratios (LLRs) are calculated ( L), and a hard-decision on the symbols is made ( X), which leads to a hard-decision on the bits ( B).

IV. EXPERIMENTAL VERIFICATION
To experimentally verify the proposed method, we consider the four 8-QAM constellations shown in Fig. 2, where the bit-mapping that maximizes the GMI is also shown [48] [49].For illustration purposes, we use five quasi-cyclic NB-LDPC codes with rates R ∈ {0.7, 0.75, 0.8, 0.85, 0.9} (FEC overheads of ≈ 43, 33, 25, 18, 11%) defined over GF (2 3 ) with regular variable node degree of d v = 3 and regular check node degrees d c ∈ {10, 12, 15, 20, 30} of girth 8 (R < 0.9) or girth Fig. 2. Four different 8-QAM constellations used in the numerical results taken from [48].The numbers adjacent to the constellation points give the GMI-maximizing bit labeling.The markers used for the constellation points will be subsequently used to distinguish the constellations.6 (R = 0.9), respectively.Each code has length of around 5500, i.e., always 5500 8-QAM symbols are mapped to one LDPC codeword.The parameters of the codes are summarized in Tab.I.As the Galois field over which these codes are defined is rather small, the decoding complexity is relatively small as well.Decoding takes place using 15 iterations with a row-layered belief propagation decoder.These codes are conjectured to be universal, i.e., their performance is expected to be independent of the actual channel (see also Sec.II-B).
Note that in the following we often use only a subset of constellations and code rates to keep the visualization of results simple and as we reuse previously recorded measurements.Note that the main purpose of this paper is to show that we can reuse previously recorded experimental data and evaluate the performance of NB-FEC for these experiments which is why we avoid redoing experiments.

A. AWGN Simulation Results
The performance of the five NB-LDPC codes was first tested in an AWGN channel.To this end, we first calculated the MI for the four constellations in Fig. 2   in Fig. 4 and show a clear superiority of the constellation C 4 in terms of MI.
In Fig. 4, we also show the required E s /N 0 for the different NB-LDPC codes to achieve a post-FEC SER of 10 −4 and plot that together with the corresponding net rate, given by the number of bits per constellation symbol.The obtained results show that the NB-LDPC codes follow the MI predictions quite well, although we do observe an increasing rate loss as the code rate decreases.We attribute this loss to the nonideal code design based on the fact that we only use regular codes.Optimized irregular NB-LDPC codes [50] would be necessary to for constructing better NB-LDPC codes at low rates.
In Fig. 3, we show the post-FEC SER as a function of the three performance metrics described in Sec.III-C for code rates R ∈ {0.7, 0.75, 0.8}.Changing the constellation for a given code can be interpreted as changing the nonbinary channel in Fig. 1.Additionally, in Fig. 5, we show the proposed nonbinary MI estimate I(X; Y ) as performance metric for all four constellations and all five code rates.The results in Figs. 3 and 5 clearly show that only the MI can be used as a reliable threshold.In for a post-FEC SER of 10 −4 (horizontal lines in Figs. 3 and 5), the obtained MI thresholds are summarized in the third row of Tab.I. Instead of the MI, Fig. 3 suggests that the pre-FEC SER could also potentially serve as a performance indicator, although not as reliable as the MI.With the exception of constellation C 1 , the pre-FEC SER (which depends on the distance spectrum, i.e., the distances between constellation points) could be an indicator as well.Furthermore, for high rate codes, the pre-FEC SER becomes a better indicator.This is in line with the findings of [4], where it was shown that the GMI is the proper performance indicator for systems with BICM but for high rate codes, the pre-FEC BER can still be used with a reliability that may be good enough for some applications.

B. Back-to-Back Transmission of 8-QAM Formats
To validate the AWGN results in Fig. 3, we now consider a dual-polarization 41.6 Gbaud system.The three 8-QAM constellations of Fig. 3 were generated and tested using a high-speed DAC in a back-to-back configuration.A root-raised cosine pulse shaping (roll-off factor 0.1) signal was generated as described in [48] and two code rates (R = 0.7 and R = 0.8) were considered, giving net data rates of approximately 174 and 200 Gbit/s.
The empirical MI estimate I NB as a function of the OSNR for the three constellations C 1 , C 2 and C 3 is shown in Fig. 6, where the constellation C 3 shows a clear superiority in terms of MI.In this figure, we also show the MI thresholds T 0.7 = 2.31 and T 0.8 = 2.55 from Tab. I.These MI thresholds are then used to determine equivalent OSNR thresholds for all three modulation formats (see vertical lines in Fig. 6).The measured data was then used to perform NB-LDPC decoding using a combination of the methods presented in [51] (scramblers) and [52] (interleavers).The obtained results are shown in Fig. 7 with solid markers.Additionally, from the estimated MI values, we interpolated the estimated post-FEC SER values using the AWGN simulations of Fig. 5, which are given by thin dashed (constellation C 1 ), solid (constellation C 2 ), and dotted (constellation C 4 ) lines.We observe a very good agreement between the predicted post-FEC SER and actual post-FEC SER values and thus a good match between the MI thresholds obtained for the AWGN channel and the actual performance of the codes in the experiment.

C. Transmission Experiment
In order to show that the proposed method also works for a transmission over a link, we apply the method to a transmission experiment using constellations C 2 and C 4 over a re-circulating loop, described in detail in [49].We recapitulate the experimental setup in the following.The transmission test-bed is depicted in Fig. 8 and consists of one narrow linewidth laser under test at 1545.72 nm, and additionally 63 loading channels spaced by 50 GHz.The output of the laser under test is sent into a PDM I/Q modulator driven by a pair of DACs operating at 65-GSamples/s.Multiple delayeddecorrelated sequences of 2 15 bits were used to generate the multi-level drive signals.Pilot symbols and a sequence for frame synchronization are additionally inserted.The symbol sequences are oversampled by a factor of ≈ 1.56 and pulse shaped by a root-raised cosine function with roll-off of 0.1.The load channels are separated into odd and even sets of channels and modulated independently with the same constellation as the channel under test using separate I/Q modulators.Odd and even sets are then polarization multiplexed by dividing, decorrelating and recombining through a polarization beam combiner (PBC) with an approximate 10 ns delay.The test channel and the loading channels are passed into separate low-speed (< 10 Hz) polarization scramblers (PS) and spectrally combined through a wavelength selective switch (WSS).The resulting multiplex is boosted through a single stage Erbium-doped fiber amplifier (EDFA) and sent into the recirculating loop.The loop consists of four 100kmlong dispersion uncompensated spans of standard single-mode fiber (SSMF).Hybrid Raman-EDFA optical repeaters compensate the fiber loss.The Raman pre-amplifier is designed to provide ≈ 10 dB on-off gain.Loop synchronous polarization scrambling (LSPS) is used and power equalization is performed thanks to a 50-GHz grid WSS inserted at the end of the loop.
At the receiver side, the channel under test is selected by a tunable filter and sent into a polarization-diversity coherent mixer feeding four balanced photodiodes.Their electrical signals are sampled at 80GS/s by a real-time digital oscilloscope having a 33-GHz electrical bandwidth.For each measurement, five different sets of 20 µs are stored.The received samples are processed off-line.The DSP includes first chromatic dispersion compensation, then polarization demultiplexing by a 25-tap T /2 spaced butterfly equalizer with blind adaptation based on a multi-modulus algorithm.
Frequency recovery is done using 4th and 7th power periodogram for constellations C 2 and C 4 , respectively.Phase recovery is done using the blind phase search (BPS) algorithm for both constellations.Equally-spaced test phases in the interval [− π 4 ; π 4 ) (constellation C 2 ) or in the interval [− π 7 ; π 7 ] (constellation C 4 ) are used.The phase unwrapper is modified accordingly.We consider the transmission over 8 round trips in the recirculating loop, corresponding to a distance of 3200 km. Figure 9 shows the estimated MI I NB as a function of the input power P in per wavelength division multiplex (WDM) channel, see also [49, Fig. 3-a] using the Gaussian PDF q awgn D=2 of (2).Using a PDF estimate obtained with a KDE does not lead to noteworthy differences in the MI estimate, as predicted in [21].Additionally, we show the MI thresholds T R for R ∈ {0.8, 0.85, 0.9}.The thresholds give us the region of launch powers at which transmission is possible.
To be precise, whenever the estimated MI lies above the threshold T R , it means that successful transmission is possible, where successful is defined in the same way as for finding the threshold, i.e., with a post-FEC SER below 10 −4 .For example, consider the red horizontal line in Fig. 9 corresponding to T 0.9 .We can see that with constellation C 2 , we are just barely above the line for P in ∈ {−2 dBm, −1 dBm}, which means that decoding is also only barely possible.In contrary, with constellation C 4 , we have a larger MI margin to the threshold and therefore, reliably communication is possible over a wider range of P in .
In Fig. 10, we use the post-FEC SER results of Fig. 5 to estimate the post-FEC performance of the transmission system by interpolation.The interpolated curves are given by the solid (constellation C 2 ) and dash-dotted (constellation C 4 ) lines.Additionally, we carried out actual decoding using the LDPC codes introduced before.The post-FEC SER results after decoding are given by the solid markers in the figure.We can see that the estimates from interpolation match the actual decoding performance quite well, confirming the applicability of the proposed method.

V. UNIVERSALITY REVISITED
In the previous sections of this paper, we have seen that MI-based thresholds can be used to accurately predict the performance of different modulation formats with the same NB-FEC code, for which we have computed in an offline simulation an MI-threshold.However, we want to emphasize that caution must be taken: this approach assumes that the code is universal (see also Sec.II-B).We know from [30] that practical codes with finite block lengths are not necessarily universal.In the previous examples, we have not experienced any issue with universality, as the only changes we made in the channel were a change of the modulation format, but the underlying channel (AWGN or optical transmission, which can be modeled accurately as AWGN) remained fixed.In this section, we show by means of an example the impact of a more drastic change of the nonbinary channel.
We now modify the channel in the AWGN simulation by adding a hard decision to the output of the optical channel.We assume then that the optical channel generates a hard decision output based on the Euclidean distance decision metric, i.e., the output is Although the outputs of the channel are NB hard symbols, we can still carry out soft decision decoding.In soft-decision decoding, the soft symbol demodulator calculates LLRs based on the channel statistics and the received values.Assume a memoryless optical channel and let W j,k := P Ŷ |X (s j |s k ) denote the channel transition probability of receiving symbol s j provided that symbol s k has been sent.We can interpret this channel as a nonbinary version of the classical binary symmetric channel (BSC), often also called discrete memoryless channel (DMC).We can then compute a set of NB LLRs with where φ(i) = s i is the symbol mapping function.We can then use these LLRs to feed a conventional soft-decision decoder.This situation may seem at a first glance counterintuitive, as we first make a decision and then regenerate softdecision LLRs to use in a soft-decision NB-FEC.However, such a situation may arise when designing NB-FEC schemes for updating legacy systems that include a hard decision on symbol level which cannot be changed.The MI for this scheme is computed as For illustration, we consider this scheme with the NB-LDPC codes specified in Tab.I and carry out a simulation over the AWGN channel with the four 8-QAM constellations shown in Fig. 2.
Figures 11 and 12 show the post-FEC SER as a function of the pre-FEC SER after 15 LDPC decoding iterations with exactly the same decoder setup as used in Fig. 5.We can clearly see that the pre-FEC SER is again not a good performance indicator while the MI is.For comparison, we also plot in Fig. 12 the MI thresholds for the different codes from Tab. I. We can see that the thresholds are not as precise as previously but still reflect the actual decoding performance.We attribute this offset to the fact that the utilized LDPC codes are not exactly universal and the length of the codes is relatively small, which is an effect that has also been observed in [30].If we are allowed to increase the length of the codes, we have found that the performance becomes more accurate again.
We hence conclude that the MI is still an accurate estimate of the NB-FEC decoding performance, even if we introduce drastic changes into the channel (like, e.g., a hard decision, going from dispersion uncompensated to dispersion compensated link, or even from coherent transmission to direct detection systems).We can improve the accuracy if the channel that is used to compute the threshold is fairly close to the channel of the system.

VI. CONCLUSIONS
Different performance metrics for coded modulation based on capacity-approaching nonbinary codes were compared.It was shown in simulations and experiments that an accurate predictor of the performance of these codes is the mutual information, even under severe changes of the channel.Uncoded metrics such as pre-FEC BER and pre-FEC SER were shown to fail.The GMI also fails for nonbinary codes, but still remains a good performance indicator for BICM with binary soft-decision FEC.We have further discussed that it is necessary that the utilized codes are universal, which is however the case for most popular FEC schemes used in optical communications.

Fig. 1 .
Fig. 1.System model of optical transmission based on NB-CM and the measurement of various system parameters.

Example 1 :
Consider the following toy example for D = 1 where p Y |X (y|x) = N (x, σ 2 n ), i.e., is Gaussian distributed with variance σ 2 n and mean x and where q Y |X (y|x) = N (x, K), i.e., the receiver assumes a Gaussian distribution with different variance K = σ 2 n .In this case, we can show that I(X; Y ) = I(U ; Z), as we can represent p Y |X (y|x) = a(x, z)b(y) [43, Sec.1.10] [43, Lem.4.7].The random variable Z is an M − 1 dimensional vector with entries Z i and realizations

Fig. 3 .
Fig. 3. Post-FEC SER as a function of three different performance metrics (pre-FEC SER, pre-FEC BER and GMI) for three NB-LDPC codes.

Fig. 4 .
Fig.4.MI (lines) and throughput (lines with markers) for the four 8-QAM constellations in Fig.2and the five NB-LDPC codes in Tab.I.The AWGN capacity is also shown for comparison (thick red line).

Fig. 7 .
Fig. 7. Results after actual decoding with an NB-LDPC decoder with solid markers representing actual results after FEC decoding and lines representing interpolated post-FEC SER estimates taken from the estimated MI.

1 Const. C 2 Const. C 3 Const. C 4 Fig. 11 . 9 MI 4 Fig. 12 .
Fig. 11.Post-FEC SER as a function of the pre-FEC SER for the five LDPC codes of Tab.I using the four constellations of Fig. 2 after transmission over an AWGN channel.

TABLE I CODE
PARAMETERS AND MI THRESHOLDST R FOR DIFFERENT CODE RATES R