Information Rates of Next-Generation Long-Haul Optical Fiber Systems Using Coded Modulation

A comprehensive study of the coded performance of long-haul spectrally-efficient WDM optical fiber transmission systems with different coded modulation decoding structures is presented. Achievable information rates are derived for three different square QAM formats and the optimal format is identified as a function of distance and specific decoder implementation. The four cases analyzed combine hard-decision (HD) or soft-decision (SD) decoding together with either a bit-wise or a symbol-wise demapper, the last two suitable for binary and nonbinary codes, respectively. The information rates achievable for each scheme are calculated based on the mismatched decoder principle. These quantities represent true indicators of the coded performance of the system for specific decoder implementations and when the modulation format and its input distribution are fixed. In combination with the structure of the decoder, two different receiver-side equalization strategies are also analyzed: electronic dispersion compensation and digital backpropagation. We show that, somewhat unexpectedly, schemes based on nonbinary HD codes can achieve information rates comparable to SD decoders and that, when SD is used, switching from a symbol-wise to a bit-wise decoder results in a negligible penalty. Conversely, from an information-theoretic standpoint, HD binary decoders are shown to be unsuitable for spectrally-efficient, long-haul systems.


I. INTRODUCTION
T HE demand for ever higher transmission rates in optical fiber transmission systems has led researchers to study the performance of transceivers based on sophisticated forward error correction (FEC) techniques.Next-generation longhaul transceivers will use powerful FEC and high-spectralefficiency (SE) modulation formats, a combination known as coded modulation (CM).In order to provide reliable transmission, a FEC encoder maps blocks of information bits into longer blocks of coded bits that are sent through the channel at a nominal transmission rate.As a result, the information rate is, in general, lower than the nominal one by an amount that depends on the redundancy added by the FEC encoder, which in turn needs to be adjusted based on the quality of the channel.
A key performance parameter of such systems is then the the maximum rate at which an optical communication system can be operated whilst maintaining reliable transmission of information.
To have an estimate of this rate, a widely used approach in the optical communication literature is based on identifying a pre-FEC BER threshold, for which a specific highperformance FEC code can guarantee an error-free performance after decoding.The code rate of such a coding scheme, multiplied by the raw transmission data rate, is used to identify an achievable information rate (AIR) for that specific system configuration.On the other hand, information theory, founded by Shannon in his seminal paper [1], shows that quantities such as the mutual information (MI) can precisely indicate what is the maximum information rate at which a code can ensure an arbitrarily small error probability [2], [3].Moreover, several recent works have showed that both the MI and the generalized mutual information (GMI) [4], [5] are more reliable indicators than the pre-FEC BER of the performance of coded optical fiber systems, regardless of the specific channel used for transmission [6]- [12].
The channel MI (i.e., the MI including the channel memory) represents an upper limit on the AIRs for a given channel when a given modulation format is used and an optimum maximum likelihood (ML) decoder is used at the receiver.However, the implementation of such a decoder is prohibitive, both for complexity reasons and due to the lack of knowledge of the channel law.Instead of the optimum decoder, more pragmatic CM decoders are usually employed.Typical CM decoder implementations used in optical communications neglect the channel memory [9] and are, thus, suboptimal.Furthermore, their design involves two degrees of freedom.Each degree of freedom presents two options: hard-decision (HD) vs. soft-decision (SD) decoding and bit-wise (BW) vs. symbolwise (SW) demapping, effectively producing four different design options.The channel MI is not in general an AIR for any of these four suboptimal schemes.Indeed, the adopted decoding strategy has a major impact on the AIRs, which can potentially be significantly lower that the channel MI.A common approach to calculate AIRs for specific decoder implementations is based on two steps: i) the memory of the optical fiber channel is neglected and the MI is calculated for an equivalent memoryless channel; ii) the mismatched decoder principle is used [13]- [16].Each of these two methods results in a lower bound on the channel MI.In [17] the memoryless MI was studied for coherent optical fibre systems using ring constellations.In [6], [7], the same quantity was used in an experimental scenario as a system performance metric for an SD coded system.In [9] and [10,Fig. 6], it was shown that when BW decoders are used, the GMI is a better metric to predict AIRs than the MI.The GMI has also been used to evaluate the performance of experimental optical systems in [18]- [20].The memoryless MI and the GMI were also shown to be good post-FEC BER predictors for SD-SW (nonbinary) and SD-BW decoders, in [8] and [10] respectively.Finally, a study comparing SD-SW and HD-BW AIRs for polarization multiplexed (PM) quadrature-amplitude modulation (QAM) formats (PM-16QAM and PM-64QAM) was presented in [21], where electronic dispersion compensation (EDC) or digital backpropagation (DBP) are used at the receiver for a given transmission distance.
In this work, we extend the results in [21] adding, for the first time, AIRs for HD-SW decoders to the picture.Furthermore, we present a comprehensive comparison of the AIRs of the optical fiber channel for different CM decoder implementations and for all transmission distances of interest for mid-range/long-haul terrestrial and transoceanic optical fiber links.The AIRs are also compared for different equalization techniques and over different PM-M QAM formats with nominal SE above 4 bits/sym per polarization such as PM-16QAM, PM-64QAM, and PM-256QAM.The results in this paper show the design trade-offs in coded optical fiber systems where, for a given distance requirement,there is a trade off between transmission rates and transceiver complexity (modulation format, equalization, and decoding).To the best of our knowledge, this is the first time such an extensive study is performed for optical fiber communication systems.The paper is structured as follows: in Section II, the investigated system is first modeled and the different decoding strategies analyzed in this work are described; Section III discusses in a semi-tutorial style the information-theoretic quantities used to evaluate their performance and, as a reference, results are shown for the additive white Gaussian noise (AWGN) channel.In Section IV, the numerical setup is explained and AIR results for the optical fiber channel are shown; finally in Section V, conclusions are drawn.

Binary FEC Encoder
Mapper Φ Fig. 2. Two different implementation alternatives for the CM encoder in Fig. 1.

II. SYSTEM MODEL
We consider the schematic diagram in Fig. 1, representing a generic multispan optical fiber communication system.Although in this work PM (4D) modulation formats are considered, we assume for simplicity that each polarization can be treated as an independent parallel channel.Under this assumption, and for the modulation formats studied in this paper (PM-16QAM, PM-64QAM, and PM-256QAM), the system under analysis can be reduced to a single-polarization (2D) one.At the transmitter, a CM encoder encodes a stream of N b information bits . ., X Ns ], each drawn from a set of M complex values S = {s 1 , s 2 , ..., s M }, where M is a power of 2. 1 The rate at which this operation is performed (in bits per symbol) is therefore given by In our analysis, we will only consider the case where the symbols X n forming a codeword X Ns are independent, identically distributed (i.i.d.) random variables with equal probability 1/M . 2lthough all CM encoders are inherently nonbinary encoders, the encoding process described above can be implemented in two different ways, as shown in Fig. 2. In the first implementation, shown in the top part of Fig. 2  and subsequently a memoryless mapper Φ is used to convert blocks of log 2 M bits into symbols of the constellation S. 3This implementation is naturally associated with CM decoders based on a demapper and a binary FEC decoder.The second implementation is shown in the bottom part of Fig. 2, where bits are first mapped into a sequence of nonbinary information symbols, which are then mapped into sequences of nonbinary coded symbols by a nonbinary FEC encoder [8].In this case, the decoding can be performed by a nonbinary FEC decoder.
In this paper, we do not consider cases where symbols are not uniformly distributed, i.e., when a probabilistic shaping on S is performed [24]- [29].Moreover, throughout this paper, we focus our attention on high SEs (>2 bits/sym/polarization), and thus the constellation S is assumed to be a square M QAM constellation where M ∈ {16, 64, 256}.
The symbols X n are mapped, one every T s seconds, onto a set of waveforms by a (real) pulse shaper p(t), generating the complex signal ( The signal s(t) propagates through N sp spans of optical fiber (see Fig. 1), optically amplified at the end of each span by an erbium-doped fiber amplifier (EDFA).At the end of the fiber link, the signal is detected by an optical receiver.As shown in Fig. 1, the first part of the receiver includes an equalizer and a matched filter (MF), which are assumed to be operating on the continuous-time received waveform r(t). 4The equalizer performs a compensation of the most significant fiber channel impairments, either the linear ones only, as in the case of EDC, or both linear and nonlinear, as with DBP.The equalized (but noisy) waveform y(t) represents the input of the detection stage and can be therefore effectively considered as the output of the so-called waveform channel [30,Sec. 2.4].Such a channel is formed by the cascade of the physical channel and the equalization block at the receiver, as shown in Fig. 1.The physical channel (i.e., fiber spans and amplifiers), also referred to as nonlinear Schrödinger channel in [31], is described by the nonlinear Schrödinger equation [32,Sec. 2.3].
The receiver estimates the transmitted bits based on the set of observations Y Ns that are extracted from the signal y(t), using an MF matched to the transmitted pulse p(t) As shown in [33], [34], (3) does not necessarily represent the optimum way to reduce this particular waveform channel to a discrete-time one.However, the focus of this work is on the performance of CM encoder and decoder blocks, operating on the input and output of the discrete-time channel, regardless of the suboptimality of the observations Y Ns .
In the following section, we will discuss AIRs of the four decoding strategies shown in Fig. 3, representing different implementations of the CM decoder.The importance of these structures lies in the fact that they cover all main options employing a memoryless demapper.Each BW configuration (see Figs. 3(b) and (d)) is characterized by a CM decoder formed by two blocks: a memoryless demapper and a binary FEC decoder.The SW strategies (see Figs. 3(a) and (c)) are instead characterized by the adoption of a nonbinary decoder operating directly on symbol level metrics derived from the samples Y n .Each of the HD schemes (see Figs. 3(c) and (d)) operates a symbol/bit level decision before the FEC decoder, which as a result operates on discrete quantities (hard information).In the SD case (see Figs. 3(a) and (b)), the decoder instead produces codeword estimates based on BW or SW log-likelihood (LL) values 5 , which are distributed on a continuous range of values (soft information).

A. Information-theoretic Preliminaries
Consider an information stable, discrete-time channel with memory [35], characterized by the sequence of probability density functions (PDFs)6 The maximum rate at which reliable transmission over such a channel is possible is defined by the capacity [35, eq.(1. 2)]: where p X N is the joint PDF of the sequence X N under a given power constraint.When p X N is fixed, the quantity in ( 5) is the MI between the two sequences of symbols X N and Y N , and is the average per-symbol MI rate [2], [16], which has a meaning of channel MI.For a fixed N , (7) represents the maximum AIR for the channel in (4), and can be achieved by a CM encoder generating codewords X Ns according to p X N , used along with an optimum decoder. 7Such a decoder uses the channel observations y Ns to produce codeword estimates XNs based on the rule where the codeword likelihood p Y Ns |X Ns is calculated based on the knowledge of the channel law (4).The expression of the channel law (4), for N large enough to account for the channel memory, remains so far unknown for the optical fiber channel despite previous attempts to derive approximated [36], [37] or heuristic [38] analytical expressions.On the other hand, brute-force numerical approaches appear prohibitive.An immediate consequence is that the exact channel MI for a given modulation format cannot be calculated.The second consequence is that the optimum receiver potentially achieving a rate R = I mem cannot be designed.However, using the mismatched decoder approach, it is still possible to calculate nontrivial AIRs for the optical fiber channel in Fig. 1, when suboptimal but practically realizable CM encoders and decoders are used, such as the ones described in Section II (see Fig. 3).
The method of the mismatched decoder to calculate AIRs for specific decoder structures originates from the works in [13], later extended to channels with memory in [14] and recently applied to optical fiber systems in, e.g., [15], [16], [21].This approach consists of replacing, in the calculation of the channel MI, the unknown channel law with an auxiliary one, obtaining a lower bound.Moreover, such a bound represents an AIR for a system using the optimum decoder for the auxiliary channel.The tightness of such a lower bound depends on how similar the auxiliary channel is to the actual one.On the other hand, no converse coding theorem is available for the bound obtained using a given auxiliary channel.In other words, even when a mismatched decoder is used, the estimated rate is not necessarily the maximum achievable rate.Counterexamples have been shown, e.g., in [39].
Nevertheless the AIRs calculated via the mismatched decoder approach still represent an upper bound on the rates of most, if not all, coding schemes used in practice.Furthermore they are a strong predictor of the post-FEC BER of such schemes, as shown in [6]- [8], [10].

B. AIRs for SD CM Decoders
Since each of the CM decoders presented in Section II neglects the memory of the channel in (4), a first decoding mismatch is introduced.In what follows, we will discuss this mismatch using the SD-SW case (see Fig. 3(a)) as a representative example of all other CM decoders.
For the SD-SW, the nonbinary decoder requires SW likelihoods p Yn|Xn , with n = 1, 2, . . ., N .These N PDFs can be derived for each n by marginalizing the joint PDF in (4).For simplicity, however, practical implementations use a single PDF across the block of N symbols.We choose the PDF in the middle of the observation block, i.e., at time instant n = n = ⌈N/2⌉.The marginalization of (4) in this case gives where C denotes the complex field, ỹN−1 [y 1 , . . ., y n−1 , y n+1 , . . ., y N ], and the conditional PDF p Y N |X n in (9) can be expressed as The choice for the single PDF to be the one in the middle of the observation block is arbitrary.However, this choice is justified by the fact that p Y n |X n (y n|x n) will be a good approximation of all other PDFs p Yn|Xn (y n |x n ) with n = 1, 2, . . ., N when N is large.
The demapper is then assuming a channel that is stationary across the block of N symbols. 8This channel is fully determined by a PDF p Y |X (y|x) defined as with n = 1, 2, . . ., N .When i.i.d.symbols are transmitted, the MI for this auxiliary memoryless channel is given by The SD-SW MI in ( 12) is an AIR for the SD-SW decoder structure in Fig. 3(a), where the demapper computes LLs log p Y |X (y|x), and the FEC decoder estimates each transmitted codeword using (8) with a codeword likelihood given by In most cases, the channel law p Y N |X N is unknown and therefore p Y |X (y|x) is not available in closed form to the receiver.Also, numerical estimations of p Y |X (y|x) are often prohibitive.As a result, practical implementations not only ignore the memory of the channel (first mismatch), but also make an a priori assumption on the PDF p Y |X (y|x).This assumption introduces a second mismatch, which we discuss in what follows.
Most receivers assume a circularly symmetric Gaussian distribution for (11).In this case, an AIR is given by [21, eq. ( 2)] where represents the auxiliary Gaussian channel with complex noise variance σ 2 , which accounts for the contributions of both ASE and nonlinear distortions.As shown in [41], [42], the marginal PDF for the optical fiber channel is in most practical cases well approximated by a circularly symmetric Gaussian distribution. 9Therefore, as pointed out in [21], we generally have In this case, as we will discuss in Sec.IV, the AIRs of SD-SW decoders can be quite accurately estimated using the MI expression for the AWGN channel and the effective signal-tonoise ratio (SNR) at the MF output In the SD-BW implementation (see Fig. 3(b)), for each received symbol Y the demapper generates log 2 M BW LLs [10], [5,Ch. 3].These LLs are usually obtained assuming no statistical dependence between bits belonging to the same transmitted symbol.When such LLs are calculated based on a memoryless channel law p Y |X (y|x), the relevant quantity for the coded performance is the GMI [5, eq.(4.54)], [10, eq. ( 24)] where B k denotes the k-th bit of X and I(B k ; Y ) denotes the MI between transmitted bits and received symbols.When the LLs are calculated using the auxiliary channel in (15) instead of the true channel, the GMI is lower-bounded by 9 A deviation from a circularly symmetric Gaussian PDF can be observed, e.g., in the following cases: amplification schemes different from EDFA (such as Raman amplifiers) [36], dispersion-managed links (see for instance [16]), and for very high transmitted powers.where I b k is the subset of indices of the constellation S having the k-th bit equal to b ∈ {0, 1} and Similarly to the SD-SW case, for the optical fiber channel in Fig. 1 we have ĨSD-BW ≈ I SD-BW .

C. AIRs for HD CM Decoders
As illustrated in Figs.3(c) and (d), the HD decoders are preceded by a threshold device casting the channel samples Y Ns into a discrete set of values.In the SW case (Fig. 3(c)), such a device provides a sequence of hard SW estimates XNs that are passed to a nonbinary decoder.The channel will in general show memory across multiple symbols Xn .However, in analogy with (9), we can replace (21) with an equivalent memoryless channel defined by where the p ij are the SW crossover probabilities.Using the same argument on the channel memory used for the SD-SW case, the quantity represents an AIR for the HD-SW CM decoder in Fig. 3(c). 10hen the HD decoder structure is preserved but a binary decoder is instead used (Fig. 3(d)), the threshold device needs to be followed by a symbol-to-bit demapper producing a sequence of pre-FEC bits estimates BN b .Again, although the resulting binary channel might show memory, the HD FEC decoder typically neglects it and the most likely codeword is calculated based on each single detected bits.The marginal channel law P B|B ( b|b) is in this case represented by a binary symmetric auxiliary channel11 where p corresponds to the average pre-FEC BER (BW crossover probability).The quantity then represents an AIR for an HD-BW CM decoder in Fig. 3(d).

D. Relationships Between AIRs
The relationships between the above discussed AIRs are summarized by means of the graph in Fig. 4. Nodes that are connected in the graph indicate the existence of an inequality between the quantities in each of the nodes.The direction of the arrows show which quantity is upper-bounding the other.
For any given input distribution, the rate I mem upper-bounds all other quantities.In particular we have where the first inequality can be proven using the chain rule of the MI (see [43], [17, Sec.IV], [3, Sec.2.5.2]).The second inequality instead reflects the additional mismatch caused by a memoryless demapper based on (15) and not on (9).The proof of this inequality follows from the definitions ( 12) and ( 14) and is given in [14,Sec. VI].Due to the assumption of independent bits within each transmitted symbol in the calculation of (18), it can also be shown that [5,Sec. 4.4] Again, the second inequality reflects the loss of information of a mismatched demapper calculating BW LLs based on ( 15) rather than on (9).Due to the data-processing inequality [3, Sec.2.4] and the mismatch of the illustrated HD decoders to the potential channel memory, we have Finally, similarly to the SD case, we have In general, nothing can be said on the relationship between I SD-BW and I HD-SW .Also, no systematic inequality holds between the mismatched versions of the SD AIRs ( ĨSD-SW , ĨSD-BW ) and the HD AIRs (I HD-SW , I HD-BW ).However, as already discussed in Section III-B, for the optical fiber channel the mismatched AIRs are expected to be very close to the AIRs obtained with perfect knowledge of the channel marginal PDF in (9).
When the channel is indeed AWGN, clearly In this case, as illustrated in 4, I SD-SW and I HD-SW are the maximum AIR for SD-SW and HD-SW decoders, respectively [1], since each demapper is matched to the channel. 12Conversely, for BW decoders, rates higher than I SD-BW and I HD-BW are still possible (see, e.g., [39]).
In order to better illustrate the relationships discussed above, the four AIRs in ( 12), ( 18), (23), and (25) were calculated for the AWGN channel.In Fig. 5, I SD-SW , I SD-BW , I HD-SW , and I HD-BW are shown vs. the SNR in (17) for the three M QAM formats analyzed in this paper: 16QAM, 64QAM, and 256QAM.For 16QAM, the HD AIRs are below both of the SD AIRs.It should be noted that for SD decoders, a negligible penalty is incurred by using a BW structure.As the modulation order is increased, and for low enough SNR values, it can be observed that the HD-SW AIRs match or exceed the SD-BW AIRs.Also, in this regime, the performance of these two decoders are comparable to the SD-SW one.This behaviour is clearer for a 256QAM modulation format, where a more significant penalty is incurred by using BW demapping in an SD CM decoder, whereas the HD-SW structure performs as well as the SD counterpart.When the modulation format cardinality increases, an HD-BW decoder incurs, in general, significant penalties in AIR.Finally, the inequalities in ( 27)-( 30) can be seen to hold for all modulation formats shown, as expected.

A. Numerical Setup
In this section, numerical results based on split-step Fourier (SSF) simulations of optical fiber transmission are presented.As shown in Fig. 1, the simulated system consists of an optical fiber link comprising multiple standard single mode (SMF) fiber spans (parameters shown in Table I), amplified, at the end of each span, by an EDFA which compensates for the span loss.At the transmitter, after the CM encoder, PM square M QAM formats (PM-16QAM, PM-64QAM, PM-256QAM) were modulated using a root raised cosine (RRC) filter For each polarization of each WDM channel, independent sequences of 2 18 symbols were transmitted.The fiber propagation was simulated by numerically solving the Manakov equation through the SSF method.In order to obtain ideal equalization performance, the sampling rate at which the equalizer was operated was the same as the fiber propagation simulation (512 GSa/s).
After the MF (see Fig. 1) and sampling at 1 Sa/sym, AIRs calculations were performed based on the schemes shown in Fig. 3.In particular, we used ( 14)-( 15), ( 19)-( 20), (23), and (25) to evaluate ĨSD-SW , ĨSD-BW , I HD-SW , and I HD-BW , respectively.For the calculation of ĨSD-SW and ĨSD-BW in ( 14) and ( 19), Monte-Carlo integration was performed, using the 2 18 channel samples (transmitted symbols) to estimate the variance σ 2 of q Y |X (y|x).However, we found that ĨSD-SW ≈ I SD-SW , when in I SD-SW p Y |X (y|x) is replaced by q Y |X (y|x), further confirming the Gaussianity of p Y |X (y|x).In order to calculate I HD-SW and I HD-BW , a Monte-Carlo estimation of the probabilities p ij and p was performed using the pairs of sequences (X Ns , XNs ) and (B N b , BN b ), respectively.

B. Optical Fiber AIRs
In Figs.6-8, three sets of results on AIRs for the optical fiber channel are shown: EDC, single-channel DBP, and full-field DBP, respectively.Each set shows the AIR vs. transmission distance for PM-16QAM, PM-64QAM, and PM-256QAM with the four CM decoder structures discussed in Section II.For each distance, equalization scheme, and CM decoder investigated, the transmitted power was optimized, resulting in different optimal powers.The investigated link distances span the typical distances of mid-range to longhaul terrestrial links (typically 1000-3000 km), long-haul submarine (3000-5000 km), and transoceanic links (6000-12000 km).
In the EDC case for PM-16QAM (Fig. 6(a)), SD decoders significantly outperform the HD ones, particularly for long distances.SD-BW decoders incur small penalties compared to the SD-SW implementation at all distances of interest.This can be explained by observing Fig. 5(a), where the performance of PM-16QAM differs for SD-SW and SD-BW decoders only for very small SNR values (≤2 dB).As shown in Fig. 6(b), for the PM-64QAM format, SD decoders show a significant advantage over their HD counterparts (see [21] for SD-SW vs. HD-BW) and again SD-BW decoders have identical performance as the SD-SW ones at short distances.However, as the distance is increased, the AIRs of the HD-SW schemes match the SD-BW ones (see filled red circles in Fig. 6(b) and 6(c)), significantly outperforming the HD-BW rates.This trend is even more prominent for PM-256QAM (Fig. 6(c)).For this format, a crossing between the SD-BW and HD-SW AIRs can be observed at around 2300 km distance (filled red circles).More importantly, in the long distance regime, the HD-SW scheme matches the performance of the SD-SW one, with no significant penalty observed.Also, it can be noted that the HD-BW scheme shows a significant penalty (>3 bits/sym for long distances) compared to all other implementations.
In the case where single-channel DBP is applied (Fig. 7), rather small AIR gains can be noticed in general, as compared to the EDC case (Fig. 6).This can be attributed to the fact that the compensation of the nonlinearity generated by only one channel out of the five transmitted gives only a marginal improvement of the optimum SNR at each transmission distance.However, some differences in the performance can be noticed for higher order formats and long distances.Specifically, the distance at which the HD-SW transceiver matches the performance of the SD-BW ones for PM-64QAM is increased from 10000 km to 12000 km (filled red circles in Fig. 6(b) and Fig. 7(b)) and for PM-256QAM the crossing point between HD-SW and SD-BW is moved from 2300 km to 3000 km (filled red circles in Fig. 6(c) and Fig. 7(c)).
Finally, when full compensation of signal-signal nonlinear distortion is performed via full-field DBP (Fig. 8), a remarkable increase in the AIRs compared to the other equalization schemes can be observed for all decoding strategies and all modulation formats.Fig. 8(a) shows that, for PM-16QAM, the full nominal SE (8 bits/sym) can be achieved up to a distance of approximately 6000 km and by only using an HD-BW decoder (squares).This rate drops by only 0.5 bits/sym at 12000 km if SD decoders are used, and by an additional 0.5 bits/sym (to 7 bits/sym) when HD decoders are adopted.Fig. 8(a) also shows that when PM-16QAM and full-field DBP are used in conjunction, switching from a binary to a nonbinary scheme does not result in any significant AIR increase, as long as the FEC decoding strategy (HD or SD) is maintained.Higher rates can be achieved using PM-64QAM (Fig. 8(b)) and PM-256QAM (Fig. 8(c)) in conjunction with SD decoders.Again, binary and nonbinary SD schemes perform identically.For these higher order modulation formats, HD-BW decoders incur significant penalties compared to SD decoders.For PM-64QAM, this penalty becomes larger than 0.5 bits/sym for distances larger than 4000 km whereas for PM-256QAM, they become larger than 0.5 bits/sym already for distances larger than 1500 km.At long distances, the penalty increases to up to 1.6 bits/sym for PM-64QAM and 2.5 bits/sym for PM-256QAM.An improvement can be obtained by using HD-SW decoders, particularly in the long-distance regime.For PM-64QAM, the AIR gap from SD decoders is reduced to 0.5 bits/sym at 12000 km.For PM-256QAM, HD-SW decoders in general largely outperform HD-BW decoders and show performances similar to SD decoders beyond distances of 7000 km, outperforming also SD-BW decoders beyond 8000 km.
In order to highlight the performance of each decoding structure vs. the transmission distance L, in Fig. 9 we show the modulation format optimized AIRs, defined as for EDC, single-channel DBP, and full-field DBP.We observe that the set of curves shown for each equalization scheme appears as a shifted version (across the distance axis) of the other ones.This behavior is another confirmation of the fact that dispersion-unmanaged and EDFA-amplified optical fiber systems can be described by an equivalent AWGN channel and their performance is strongly correlated to the effective SNR at the MF output.Since this SNR includes nonlinear effects as an equivalent noise source, it is improved by nonlinear compensation schemes.In the EDC case (Fig. 9(a)), except for short distances (≤1000 km), HD-SW decoders have comparable performance to SD-BW and SD-SW schemes.The optimal format for both SW strategies (SD and HD) is PM-256QAM (green) at all distances, whereas for the BW schemes, PM-256QAM performs worse both for short and middle distances, where PM-64QAM (blue) is preferable, as well as in the long/ultra-long haul region, where PM-16QAM (red) is optimal.Very similar behavior is observed for singlechannel DBP in Fig. 9(b), where the optimality of PM-64QAM for BW receivers is extended to longer distances with respect to their EDC counterparts.
Finally, for full-field DBP (Fig. 9(c)), rates of up to 12 bits/sym can be targeted up to 5000 km, and for all decoding strategies, the optimal modulation format is PM-256QAM up to 4000 km.Also, in the ultra-long haul regime, rates above 8 bits/sym can be achieved by using PM-64QAM in conjunction with SD-BW systems without significant loss in performance compared to SD-SW or HD-SW with PM-256QAM.Overall, Fig. 9 also shows that HD-BW decoders perform significantly worse than all other schemes, confirming the results in [21].Nevertheless, they can be considered as a valid low-complexity alternative for short distances or when high SNRs are available at the receiver.

V. CONCLUSIONS
The MI is a useful measure of performance of a coded system and represents an upper bound on the AIRs when a given modulation format is used and optimum ML decoding is performed at the receiver.Conversely, the AIRs of pragmatic transceiver schemes are dictated by the specific implementation of the CM decoder.In this work, we presented a detailed numerical study of the AIR performance for high-SE longhaul optical communication systems when these pragmatic decoders and equalization schemes, such as EDC and DBP are employed.
The results in this paper lead to interesting conclusions on the performance of coded optical fiber communication systems.For example, when the equalizer enables high SNR values (through the use of full-field DBP), an SD decoder is not the only alternative to achieve high rates at long distances.On the contrary, HD nonbinary FEC schemes can, in principle, achieve almost the same rates across all distances of interest.For SNR values in the low to medium range (EDC or single-channel DBP), SD decoders outperform HD ones up to medium SE formats (PM-64QAM).However, for high-SE formats (PM-256QAM), the HD-SW CM decoder can outperform the SD-BW decoder.In the SD case, BW decoders do not incur significant penalties as compared to their SW counterparts, suggesting that there is no need to employ nonbinary FEC schemes.Finally, HD-BW transceivers are never desirable for high-SE systems.Nevertheless, they can represent the implementation of choice for either shortdistance systems or ultra long-haul low-SE systems whenever high order modulation formats cannot be used.

Fig. 3 .
Fig. 3.The four CM decoder implementations analyzed in this work.

Fig. 4 .
Fig. 4.Graph showing relationships between the information-theoretic quantities presented in this paper.Lines between nodes indicate an inequality, where the arrows point towards the upper bound.Dotted arrows indicate inequalities which become equalities for the AWGN channel.

Fig. 5 .
Fig. 5. AIRs vs. SNR for different modulation formats for the AWGN channel.