Digital signal processing for fiber nonlinearities [Invited]

: This paper reviews digital signal processing techniques that compensate, mitigate, and exploit ﬁber nonlinearities in coherent optical ﬁber transmission systems.


Introduction
Intra-channel and inter-channel fiber nonlinearities are major impairments in coherent transmission systems that limit the achievable transmission distance [1].Consequently, digital signal processing techniques for compensating or mitigating the effects of fiber nonlinearities and for exploiting fiber nonlinearities have been investigated.Key distinguishing features of these techniques are their complexities and their capabilities to deal with intra-channel and/or inter-channel nonlinearities.An important challenge is to achieve useful improvements in system performance with acceptable levels of computational and implementation complexity.
In broad terms, the techniques for reducing the impact of fiber nonlinearities on system performance include those that compensate the nonlinearity-induced signal distortion and those that mitigate the distortion by making the signal propagation more tolerant to fiber nonlinearities.They include perturbation solutions to the coupled nonlinear Schrödinger equation (CNLSE), single-channel and multi-channel digital backpropagation, Volterra series nonlinear equalizers, pulse shaping, and advanced modulation formats.Furthermore, a fundamentally different approach exploits fiber nonlinearity by encoding information in the nonlinear Fourier spectrum, thereby raising the prospect of replacing conventional dense wavelength division multiplexing with nonlinear frequency division multiplexing.In this paper, digital signal processing techniques for contending with fiber nonlinearities are reviewed with specific examples illustrating the diversity of techniques that have been explored.

Perturbation based pre-compensation
The perturbation-based pre-compensation technique is based on approximate time-domain solutions to the CNLSE that express the impact of fiber nonlinearities on a propagating signal as a first-order perturbation term [1,2].This approach has been shown to be effective for both pre-compensation [3,4] and post-compensation [5,6] of intra-channel fiber nonlinearities.
Assuming that the transmitted optical pulses have a Gaussian shape, analytical expressions in terms of the exponential integral function exist for the perturbation expansion coefficients [1,4].Extensions of the original approach include an additive-multiplicative model [7], a power weighted model [8][9][10], and its application to Nyquist pulse shapes [11,12] and to multi-subcarrier signals, which also serve to mitigate the performance implications of fiber nonlinearities [13,14].
The perturbation-based technique can be used to pre-compensate accumulated intra-channel fiber nonlinearities with only one computation step for the entire link and can be implemented using one sample per symbol [1,4].However, calculation of the nonlinear perturbation involves single and double summations that are functions of the transmitted symbol sequence and perturbation expansion coefficients C m,n where m and n denote symbol indices relative to the current symbol.Advances aimed at reducing the computational and implementation complexity of this pre-compensation technique include aggressive quantization of the expansion coefficients [15], and the use of symmetric electronic dispersion compensation (SEDC) and root-raised-cosine (RRC) pulse shaping [16].The quantization of the expansion coefficients has also been considered in the context of simultaneous optimization of the intervals and levels using a minimum mean square error criterion [17] and a decision directed least mean square algorithm [18].With SEDC, two simplifications result: 1) all the real parts of the coefficients Re C m,n are zero and 2) all the imaginary parts of the coefficients Im C m,n are calculated based on half of the link length L/2.This reduces the dispersion induced pulse spreading and hence the required number of terms in the truncated summations.A RRC pulse shape also reduces the dispersion induced pulse spreading and thus the number of terms in the truncated summations.
The perturbation-based pre-compensation of a signal includes intra-channel self-phase modulation (iSPM), intra-channel cross phase modulation (iXPM) and intra-channel four-wave-mixing (iFWM).With SEDC, the optical field for the current symbol (at time 0) of the x-polarization signal after nonlinear pre-compensation is: where The corresponding equations for the y-polarization signal are obtained by exchanging the subscripts x and y in Eqs. ( 1) -( 4).The nonlinear perturbation coefficients C m,n depend on the pulse shape, fiber properties, and fiber length L [1,4,16].P is the transmitted optical power, A n, x/y is the sequence of complex transmitted symbols for the x-and y-polarization signals with zero dispersion, E denotes expectation, and j = √ −1.Equation (3) represents the phase perturbation due to iSPM and iXPM while Eq. ( 4) represents the iFWM.It is important to note that for a dual polarization signal there are cross-polarization contributions in Eqs. ( 3) and ( 4).The perturbation for the x-polarization signal depends on the transmitted symbol sequences for both the x-and y-polarization signals.The complexity of the algorithm is primarily determined by the second terms in Eq. ( 3) for iXPM and Eq. ( 4) for iFWM (and the corresponding equations for the y-polarization signal).The summations are truncated in practice based on the values of C m,n being larger than a specified criterion.
The C m,n coefficients are fixed for a given transmission spectrum and fiber length.For a RRC pulse shape with a roll-off factor of 0.1 and matched filtering, the coefficients are calculated numerically as an analytical solution is not known [ where γ is the fiber nonlinear coefficient, 0 < k ≤ 1 is an optimization factor that may be used to yield the best compensation [11,18], L span is the span length, f pd (z) is the power distribution profile along the link, T is the symbol period, T m = mT, u 0 (0, t) is the pulse shape with zero accumulated dispersion (z = 0), and u 0 (z, t) is the dispersed pulse shape corresponding to a fiber length z which is calculated according to In Eq. ( 7), F denotes the Fourier transform, F −1 denotes the inverse Fourier transform, f is frequency, and β 2 is the first order group velocity dispersion coefficient [1].For a fiber length of 3600 km, with the RRC pulse shape and SEDC, Im[C m,n (L/2)] is plotted in Fig. 1.The bandwidth of a RRC pulse shape with a roll-off factor of 0.1 yields a small dispersion induced pulse spreading and hence a reduction in the number of terms in truncated approximations to Eqs. ( 3) and ( 4) compared to a Gaussian pulse or a RRC pulse with a larger roll-off factor.
For a single 128 Gbit/s polarization-multiplexed (PM) 16QAM signal and transmission over 3600 km of standard single mode fiber with EDFA amplification, the dependence of the bit error ratio (BER) on launch power is shown in Fig. 2(a) for linear post-compensation for dispersion (LC), symmetric linear pre-and post-compensation for dispersion (LC-SEDC), and RRC-SEDC nonlinear pre-compensation.The roll-off factor for the RRC pulse shape was 0.1 and the number of terms in the truncated summations for the RRC-SEDC algorithm was based on 20 log 10 C m,n /C 0,0 > −35 dB.The dependence of the BER at optimum launch power on fiber length for the three algorithms is shown in Fig. 2 coding BER threshold of 0.02, transmission over 4200 km of fiber was achieved with RRC-SEDC nonlinear pre-compensation, an increase of 900 km relative to LC and LC-SEDC.
The perturbation-based technique can be used to pre-compensate accumulated intra-channel fiber nonlinearities based on one sample per symbol and one computation step for the entire link.Advances that further reduce the computational and implementation complexity without sacrificing performance would be beneficial.The potential improvements in system performance offered by the technique need to be explored in the context of optical superchannels and flexiblegrid networks, including the possibility of extending the algorithm to account for inter-subchannel nonlinearities. -

Wideband digital backpropagation performance
Digital backpropagation (DBP) is arguably the most popular digital signal processing (DSP) technique to compensate for nonlinear optical fiber transmission impairments [19][20][21].The effectiveness of the algorithm lies in its ability to fully undo deterministic signal-signal nonlinear interference (NLI) effects.Despite its theoretical beneficial effects, many factors can contribute to limit the performance of this algorithm, such as: NLI arising from the interaction between the signal and amplified spontaneous emission (ASE) noise [22], polarization-mode dispersion [23][24][25], DSP complexity at the receiver [24,26], and limited nonlinearity compensation (NLC) bandwidth.In particular, using analytical tools it has been shown that in fully-loaded wavelength division multiplexing (WDM) systems, DBP gains are severely reduced when DBP is applied over NLC bandwidths that are relatively small compared to the overall transmitted optical bandwidth [27].If confirmed, this would represent a major setback on the effectiveness of multi-channel DBP performance, as further increasing the NLC bandwidth does not currently appear as a viable option.On the other hand, very few numerical results have been produced to test the accuracy of the available analytical models in predicting the performance of DBP for large NLC bandwidths.
In this section, the analytical tools provided in [28,29] are validated via numerical results based on the split-step Fourier method (SSFM) in a wideband transmission scenario using multichannel DBP.Then, closed-form expressions are used to describe the behaviour of the signal-to-noise ratio (SNR) gains achievable through DBP.

Validation of analytical tools for DBP performance estimation
The effect of DBP when applied over a bandwidth B NLC , less than or equal to the transmitted bandwidth B, can be predicted by resorting to a perturbation analysis [30,Sec. II].To the first-order, the DBP contribution can be considered as a subtraction of a fraction of the received NLI power.Such fraction is equal to the one generated in the forward propagation by the signal within the bandwidth B NLC if it was transmitted alone.
The receiver SNR after DBP is applied can be therefore written as where P is the transmitted power per channel, N s is the number of fiber spans, P ASE is the ASE noise power over the channel bandwidth, η(B, N s ) is the signal-signal NLI factor over a bandwidth B and N s spans, η sn is the signal-ASE NLI factor over one span, B is the total transmitted bandwidth, B NLC is the NLC bandwidth, ζ = N s k=1 k 1+ is the signal-ASE NLI accumulation factor, and is the NLI coherence factor.
In the denominator of Eq. ( 8), three terms can be distinguished (from left to right): the total accumulated ASE noise power, the residual signal-signal NLI power after DBP is applied, and the signal-ASE NLI power.As discussed in [31,32], DBP does not modify the signal-ASE NLI power generated in the forward direction.In fact, DBP undoes the signal-ASE NLI originating from the first spans in the forward direction, but replaces it with the one generated by the ASE noise in the last spans in the backward direction.
The η factor and its dependency on system parameters, such as B and N s , vary based on the specific model adopted.For instance, the GN-model [33] offers a simple closed-form expression for η(B, N s ), although with a certain degree of inaccuracy due to its inability to account for certain features of the transmitted signal, such as the modulation format.More recent models [28,30,34] have instead captured the NLI dependence on the modulation format and thus have been shown to be more accurate in the estimation of the NLI power.However, this generally comes at the cost of a higher complexity of the analytical expressions.Recently, in [29], an approximate closed-form expression was proposed for the model in [28]

Simulation bandwidth 2.04 THz
Numerical method Adaptive log-step size SSFM [35] Table 1.Parameters of the system used for the numerical study of DBP performance.
of the GN-model, hence called the enhanced GN-model (EGN).This expression for the analytical estimation of the NLI is used here.The comparison between analytical and numerical results based on the SSFM is performed for a wideband transmission system, whose parameter values are shown in Table 1.The transmission of 31×32 Gbaud PM-16QAM channels with 33 GHz spacing (B ≈1 THz) is simulated using an adaptive logarithmic step-size SSFM [35].The transmission link consists of standard single-mode fiber with EDFA amplification.At the receiver, DBP is performed ideally, using the same step-size distribution used in the forward propagation.Ideal polarization demultiplexing is then applied and no carrier phase estimation is used as laser phase noise is neglected.
In Fig. 3, the dependence of the SNR on the transmitted power is shown when either electronic dispersion compensation (EDC) or DBP over different NLC bandwidths is performed at the receiver.It can be observed that the agreement between the analytical expressions and the SSFM simulations is within 0.2 dB for all cases shown.We attribute this residual gap partly to the fact that the closed-form expression for η(B, N s ) strictly holds only for a perfectly rectangular channel spectrum (roll-off factor of 0), whereas the roll-off factor here is set to 0.03.This result confirms the validity of Eq. ( 8), where η(B, N s ) is obtained from the closed-form expression proposed in [33].

DBP SNR gains
In the previous subsection, the use of closed-form expressions to fully describe DBP performance in a wideband transmission scenario was justified.In this subsection, Eq. ( 8) is used to describe the analytical behaviour of DBP SNR gain.For small enough NLC bandwidths, it can be assumed that and thus the signal-ASE NLI can be neglected in the denominator of Eq. ( 8).The region where Eq. ( 9) holds depends on the specific transmission distance and transmitted power.By setting the derivative of Eq. ( 8) with respect to the transmitted power to zero, the optimum SNR can be found for all NLC bandwidths (including the EDC case).The DBP gain compared to the EDC case (at their respective optimum launch powers) is found in closed-form as In the regime opposite to the one indicated by Eq. ( 9), i.e., in a close neighbourhood of the full-field NLC bandwidth, the DBP gain can be approximated as This approximation holds when Eq. ( 9) can be considered true in the EDC case, which implies small enough η sn , P ASE and N s .However, this is the case for typical transmission scenarios.Two additional assumptions are made in the derivation of Eq. ( 11): (i) the dependence of η on the number of spans N s is assumed for simplicity to be the one predicted by the GN-model, and (ii) η sn = 3η(B, 1), which rigorously holds only when the WDM signal spectrum is flat and its bandwidth B is equal to the ASE noise bandwidth.The validity of Eq. ( 11) will be shown in the following.
Eq. (11) shows that the full-field DBP gain is weakly dependent on the ASE noise (P −1/3 ASE ) and transmitted bandwidth (η −1/6 ), whereas it is more strongly dependent on the transmission distance (N −1/2 s ).The two asymptotes in Eqs.(10) and (11) are illustrated in Fig. 4(a), where G DBP is shown as a function of the NLI reduction factor for different transmission distances.The NLI reduction factor can be defined as and signifies the reduction of signal-signal NLI due to DBP.For small values of NLI reduction, i.e., where signal-signal NLI is dominant compared to signal-ASE NLI, Eq. ( 10) indicates that the DBP SNR gain increases with a slope of 0.33 dB/dB, i.e., 1 dB higher gain for every 3 dB of suppressed signal-signal NLI.Due to the larger amount of signal-ASE NLI in long-distance transmissions, the gain starts to saturate at smaller values of ρ.For higher values of ρ, the gain approaches the full-field gain predicted by Eq. ( 11). the system in Table 1.
Finally, using the closed-form expressions in [29], the DBP gain can be expressed in terms of the NLC bandwidth B NLC .This relationship is illustrated in Fig. 4(b), where G DBP is shown as a function of B NLC normalized with respect to the transmitted bandwidth B = 1.023THz (see parameters in Table 1), and for different transmission distances.DBP gains are similar (within 0.5 dB difference) for all distances when DBP is applied up to approximately 60% of B. For small B NLC relative to B, the SNR gain is observed to increase slowly.For instance, in order to achieve 1 dB gain, DBP needs to be applied over approximately 10% of the transmitted bandwidth (≈100 GHz), whereas to attain a 3 dB gain, a B NLC between 57% (≈580 GHz) and 63% (≈650 GHz) of B is required, depending on the transmission distance.A rapid gain increase can instead be obtained when the full-field B NLC is approached, particularly for shorter transmission distances.Indeed, in this case, the small amount of residual signal-ASE NLI causes the gain to increase abruptly as the signal-signal NLI is fully cancelled.Higher amounts of signal-ASE NLI instead result in a more gradual increase.
In summary, we have shown, by comparison with SSFM results, that currently available closedform expressions can accurately predict the receiver SNR of transmission systems employing multichannel DBP to compensate for both intra-and inter-channel NLI.Closed-form relationships between DBP gain and the main system parameters allow quick and intuitive insight into the performance of this algorithm.For NLC bandwidths up to 60% of B, the relationship between DBP gain and NLI reduction (in dB) is linear through a factor of 1/3.In this region, SNR gains are between 1 and 3 dB.Beyond this region, and as B NLC approaches the full-field bandwidth B, the DBP gain experiences a rapid increase which is dependent on the amount of signal-ASE NLI.

Volterra based nonlinear compensation
The Volterra series is a well-known numerical tool for the modelling and compensation of nonlinear dynamic phenomena [36].It is based on a polynomial expansion, truncated to nth order, including memory effects through a series of convolution integrals.The Volterra series was first proposed for the modelling of optical fiber transmission systems in [37].It was applied to solve the NLSE in the frequency-domain, enabling the extraction of a set of nth order nonlinear transfer functions for a single-mode optical fiber, the so-called Volterra series transfer function (VSTF).The same analytical formulation was also independently developed in [38] in the context of OFDM transmission.
By inverting the 3rd order nonlinear transfer function, an inverse VSTF (IVSTF) was first applied for the compensation of fiber nonlinearities in single-polarization optical transmission [39,40].It was shown that, when applied at a low sampling-rate (2 samples per symbol), a 3rd order truncated IVSTF could provide higher performance than split-step-based DBP due to the avoidance of recursive time/frequency transitions [39].In its polarization multiplexed form, the frequency-domain nonlinear compensated optical field for the x-polarization signal, ÃNL x , is given by where Ãx is the frequency-domain received signal in the x-polarization, γ is the nonlinear coefficient, L is the IVSTF step-size (multiple of the span length, L s ), 0 < ξ ≤ 1 is a free optimization parameter, N is the fast Fourier transform (FFT) block-size, and ω n is the angular frequency at index n in the FFT block.The multi-span linear kernel, K 1 , accounts for attenuation and chromatic dispersion as where α and β 2 are the attenuation and group velocity dispersion coefficients, respectively.β 2 is evaluated at the central wavelength of the back-propagated channel.Finally, the multi-span 3rd order nonlinear kernel, K 3 , is given by where F(ω n , ω k , ω m ) is the multi-span phased-array factor [38] accounting for the coherent accumulation of nonlinearities between fiber spans ) .
(16) The nonlinear equalized optical field, ÃNL x , is finally summed with the chromatic dispersion equalization (CDE) signal, yielding the output optical field after each IVSTF step as Note that the equalization of the y-polarization signal is simply obtained by exchanging the subscripts x and y in Eqs. ( 13) and ( 17).The major challenge associated with the numerical implementation of the IVSTF lies in the O(N 2 ) dependence of the total number of operations per equalized sample, arising from the double summation in Eq. ( 13).This may limit the use of large step-sizes, since the minimum required FFT block length, N, grows with the accumulated chromatic dispersion.To tackle this issue, several approaches have been addressed.In [41], a simplified IVSTF implementation model with O(log(N)) complexity was proposed, resorting to parallel nonlinear equalization branches, each of which includes cascaded linear and nonlinear operations in a similar fashion to the SSFM.This approach exploits the linkage between the VSTF and the regular perturbation method [42], employing a frequency-flat approximation to enable time-domain processing of nonlinearities.However, this approximation may affect the performance of the algorithm, which in [41] was shown to underperform relative to single-step per span SSFM-based DBP.Alternatively, in [43], a factorization procedure has been applied to the 3rd order kernel, yielding an n-steps serial model, similarly enabling a reduction of the complexity down to O(log(N)), but also suffering from a performance penalty relative to the full IVSTF model.
Penalty-free approaches have also been proposed, such as the use of symmetric electronic dispersion compensation to reduce the amount of accumulated dispersion to be inverted by the IVSTF [44] and the use of a cascaded IVSTF structure [45], where the position of the linear kernel, K 1 , is changed in order to relax the FFT block length requirements for the evaluation of K 3 .
Another way of reducing the computational effort of the IVSTF is through the inspection and selective pruning of the K 3 coefficients, whose distribution of real and imaginary parts is illustrated in Fig. 5 for an exemplary standard single mode fiber span.For ease of visualization, all coefficients are normalized with respect to the absolute maximum value of the real component.Regular coefficient patterns and column/diagonal symmetries can be clearly observed.Depending on the combination of angular frequencies, different nonlinear phenomena can be identified and categorized as: • iSPM: when the three optical field components coincide in frequency, i.e., for ω m = ω k = ω n ; • iXPM: when the conjugated optical field component coincides in frequency with only one other component, i.e., for ω m = ω k ω n or ω n = ω k ω m ; • degenerate iFWM: when the two non-conjugated optical field components coincide in frequency, i.e., for ω k = ω n+m−k ; • iFWM: for all other possible combinations of ω m , ω k and ω n .
As can be easily perceived from the inspection of Eq. ( 15), all iSPM and iXPM occurrences take the same real-valued coefficient, to which corresponds the maximum relative contribution in the K 3 kernel (unitary values in Fig. 5).Based on this inspection of the 3rd order kernel, a simplified Volterra series nonlinear equalizer (VSNE) has been proposed in [46], where the full K 3 matrix is gradually reconstructed as a series of one-dimensional parallel frequency-domain filters, building up from the iSPM+iXPM components and accounting for the symmetries in K 3 .An exact full reconstruction of the K 3 kernel was shown to yield a reduction of the computational complexity by a factor of ∼3 without any performance penalty [46].Further simplification can be achieved by exploiting the iXPM-like behavior of the coefficients in the vicinity of the true iXPM components, as can be seen in Fig. 5. Therefore, within a region of validity all coefficients can be forced to the iXPM value incurring only a small error, with a significant reduction in the implementation complexity by avoiding the double summation in (13).This frequency-flat approximation differs from other similar assumptions in the literature [41], since it is associated with an incomplete kernel reconstruction process that departs from the true iXPM component and stops at an optimum number of additional coefficients [46].Therefore there is a tradeoff between the error generated by the frequency-flat approximation and the error due to an incomplete kernel representation.Building upon this simplified VSNE, equivalent time-domain realizations have also been derived in [48] and experimentally demonstrated in [49], yielding SSFM-like structures with parallel nonlinear compensation branches [50], similar to [41].The IVSTF and its simplified versions proposed in [46] have been experimentally demonstrated in [47], for the nonlinear compensation of a 10×124.8Gbit/s PM-64QAM optical system.The signal was transmitted over pure silica core fiber with an effective area of 150 µm 2 , span length of 54.44 km, attenuation of 0.161 dB/km and dispersion parameter of 20.7 ps/nm/km.The results depicted in Fig. 6 show an improvement of ∼25% in the maximum reach (from ∼1200 km to ∼1500 km) at a BER of 2.7 × 10 −2 , provided by nonlinear compensation with the 3rd order IVSTF.A single step IVSTF (step-size L equal to the full transmission length) was sufficient to achieve the maximum equalization performance.In turn, the frequency-flat simplified VSNE was found to require a total of 4 steps to enable the same maximum reach.Nevertheless, despite the increased processing latency due to 4 cascaded steps, the simplified VSNE was found to reduce the total computational effort by more than 3 orders of magnitude relative to the full matrix-based IVSTF.
Recent advances on IVSTF-based nonlinear compensation have demonstrated similar equalization performance to the widely used SSFM-based DBP, with comparable or even lower computational effort.The full potential of Volterra-based nonlinear compensation is still however far from being achieved.Additional research efforts are required to tackle key implementation aspects such as fast and adaptive coefficient estimation [51] and expansion of the algorithms to account for inter-channel nonlinear compensation in the context of optical superchannels.

Advanced modulation for nonlinear transmission
The effect of advanced modulation formats on the performance of optical fiber transmission systems can be studied by estimating the achievable information rate (AIR).The AIR provides an upper bound on the maximum data rate, which can be transmitted through a fiber, while also setting a lower bound on the total fiber channel capacity.The AIR is calculated from the mutual information (MI) between the channel input sequence X K 1 and channel output sequence where H is the entropy function.The AIR is usually expressed in bits/symbol.The modulation alphabet X has an effect on the AIR both through the entropy H(X K 1 ) and the conditional entropy H(X K 1 |Y K 1 ).While the former sets an upper bound on the AIR and the spectral efficiency, the latter is a metric of the quality of the received signal, and is usually implicitly used as a design metric.For example, constellation alphabets which reduce nonlinear interference noise (NLIN) increase the signal-to-noise-plus-interference also referred to as the effective signal-to-noise ratio (SNR) in this section.NLIN is comprised of the signal-signal, signal-ASE and ASE-ASE nonlinear interference effects.This usually leads to reduced uncertainty H(X K 1 |Y K 1 ).On the other hand, such constellations can lead to reduced entropy H(X K 1 ) due to constraints in their construction, leading to a contradiction in the design.It is noted that the output sequence y K 1 are the samples right before demapping to bits, and the received effective SNR=E thus includes all the penalties from the non-ideal DSP chain (e.g., analog-to-digital conversion, filtering, equalization, phase noise recovery, etc.).
Constellation design in general includes both the positions of the points in the I/Q plane and their probabilities.The former is referred to as geometric shaping and the latter as probabilistic shaping.

Geometric shaping
One of the first papers on geometric shaping for optical fiber communications was [52].The main idea was to restrict high-energy symbols in the constellation, thus lowering the peak-to-average power ratio and mitigating the nonlinear effects.To that end, ring constellations were studied and optimized for fiber transmission.
A similar approach to constellation design was studied and demonstrated in [53].Iterative methods were used for optimizing the radii and the number of symbols on each ring with the constraint of 256 symbols in total.An example of the designed polar modulation format is given in Fig. 7b, together with the reference 256QAM format in Fig. 7a.The received constellation diagrams are for a linear AWGN channel with SNR=25 dB and input constellations X scaled to unit power.The energy for the polar modulation format is more concentrated towards the origin, thereby allowing for shaping gains over the uniform QAM format in terms of MI for a linear channel.Furthermore, the peak-to-average power ratio is reduced compared to QAM, thus resulting in lower NLIN power.Single channel experimental results were demonstrated for 256 polar modulation [53] with more than 1 dB gain over 256QAM for a 400 km, 28 Gbaud link.
Several other works study geometric signal shaping by imposing constraints on the allowed multi-dimensional sequences, where the considered dimensions are state of and time slots.Lattices were studied in [54] for multi-dimensional constellation design.An optimized minimum Euclidean distance can be (asymptotically) achieved with such constructions, which allows for reduced symbol error rate on a linear AWGN channel.However, a performance penalty was observed in the presence of nonlinearities [54].Furthermore, bit-to-symbol mapping is non-trivial for such constellations.
Polarization balanced multi-dimensional signaling was considered in [55].Polarization balancing is achieved by constraining the multi-dimensional symbols such that the multidimensional energy is constant.Similar to the idea of ring constellations, where the high-energy signals are avoided, such multi-dimensional constellations reduce the NLIN power.The 256 polar modulation from [53] does not change the entropy H(X) with respect to 256QAM due to the preserved cardinality of the constellation.In contrast, due to the constellation restriction, this entropy and thereby the spectral efficiency is reduced with multi-dimensional signaling as in [55].Taking this reduction into account, around 1 dB of of net system margin was achieved with an 8D QPSK constellation with respect to the standard BPSK constellation at the same spectral efficiency of 2 bits per time slot in a fully-loaded WDM system with a modulation rate of 35 Gbaud per channel and optical dispersion compensation.
The theoretical gains of such systems were analyzed in [56], where the constellation was restricted to a multi-dimensional ball, for which the mass is concentrated on a multi-dimensional sphere when the number of constellation symbols is large.It was shown that the gains potentially exceed the ultimate shaping gain on an AWGN channel of 1.53 dB.Operating such systems at high spectral efficiency is non-trivial due to the complexity of the DSP at the receiver side.Optimal detection generally requires that each possible input combination of symbols is evaluated, which generally results in an exponential increase in complexity both with the dimensionality (time slots) and the spectral efficiency (cardinality) of the base modulation format (restricted to QPSK in [55]).

Probabilistic shaping
As mentioned, probabilistic shaping attempts to increase the MI by optimizing the probability mass function (PMF) p X (X) of the input symbols.This directly results in reduced entropy H(X) and thus maximum spectral efficiency of the format.However, near capacity achieving systems operate in a region for which the AIR is not limited by the entropy as much as by the effective SNR at the receiver, thus benefiting from a non-uniform PMF.Probabilistic shaping was performed in [57] by the method of trellis shaping, and near-capacity performance was reported in a simulation.Probabilistic shaping in a 4D space (I/Q dimensions of 16/64QAM in each polarization) was considered in [58], where the 4D PMF was such that, similar to the geometric shaping approach, the points with smaller multi-dimensional amplitude appear more often.Gains of a few hundred kilometers in transmission distance can be achieved with such schemes.
Optimization of the PMF was performed in [59], where the PMF was taken from the Maxwell-Boltzmann (MB) family, for which p X (X = x) ∝ exp (λ|x| 2 ), i.e., the PMF is also amplitude driven.By carefully optimizing the scaling parameter λ, the PMF can be matched to the channel conditions (the effective SNR).An example of such a PMF for λ = −0.4 and a 256QAM constellation is given in Fig. 7(c).Since low-energy points appear more often, the constellation is scaled, and for unit power and the same SNR as the uniform PMF from Fig. 7(a), the Euclidean distance is increased, resulting in decreased uncertainty H(X K 1 |Y K 1 ) and increased MI.Gains of up to 400 km in transmission distance were achieved in [59] in a simulation.Experimental demonstration for a selection of MB PMFs was carried out in [60] in combination with a low-density parity check convolutional code.The same gains were experimentally confirmed for a variety of AIRs, which were achieved by rate-matching the independent identically distributed input binary data to the specific MB PMF.Most recently, a system was demonstrated for a transoceanic distance with a record high capacity [61].The simplicity of the rate matcher, together with its transparency to the FEC makes it attractive for optical fiber communications.An iterative approach to probabilistic shaping was taken in [62], where the PMF was not restricted to the MB family.The PMF was optimized by a modified Blahut-Arimoto algorithm, and it was shown that probabilistic shaping outperforms the geometric shaping scheme from [52].In order to achieve the non-uniform PMF of the output, a many-to-one bit-to-symbol labeling was proposed in a combination with a convolutional turbo code.It was shown in [63], that this optimization slightly outperforms the MB family, which for two constellation symbols x i and x j has the restriction of p(x i ) > p(x j ) for |x i | < |x j |.However, similar experimental gains were achieved as in [60], suggesting that the specific PMF shape is non-consequential in practice under the constraint of independent symbols in each time slot.The performance of the system is given for 256QAM and 1024QAM in Fig. 8 for a 5×10 Gbaud WDM system and distances between 800 and 1700 km at the optimal launch power.The received effective SNR is given in Fig. 8(a).Since the peak-to-average power ratio of the shaped system, particularly for the 1024QAM constellation, is increased, the NLIN noise is enhanced, resulting in slightly decreased effective SNR.However, the AIR with 1024QAM is still superior to the other formats (see Fig. 8(b)) by ≈ 0.2 bits/symbol, which translates to 300 km (3 spans) gain at 1200 km (≈ 25% reach increase).
It is noted that advanced constellations, such as the ones described here require non-standard equalization and/or phase noise recovery.It was demonstrated in [61,63], that pilot symbols can be used at a rate of 1-2% for both purposes.This technique also improves the tolerance to phase slips, allows for adaptive equalization, and can potentially be used for frequency and clock recovery.However, improving the DSP performance both in terms of effective received SNR and reduced pilot rate is of interest in practice.
Most of the constellations considered in this section (with the exception of the multi-dimensional QPSK [55]) operate on a memoryless basis, is, p(x K 1 ) = k p(x k ).Similar gains of about 2-4 fiber spans km) are achieved in all the above references under this assumption.In order to improve the gains, PMFs with memory are required.Optimizing such PMFs is not trivial due to the increased dimensionality, and furthermore, optimal processing at the receiver becomes exponentially complex (as mentioned previously) for high spectral efficiency systems.Such multi-dimensional PMFs with jointly optimized geometric and probabilistic constellation properties, and with practical receiver processing are of interest.

Encoding in the nonlinear Fourier spectrum
For their discovery in the 1970s of the mathematical framework underlying the nonlinear Fourier transform, C. S. Gardner, J. M. Greene, M. D. Kruskal and R. M. Miura received the prestigious 2006 Leroy P. Steele Prize for a Seminal Contribution to Research, awarded by the American Mathematical Society.In describing this work in [64], the author wrote that "nonlinearity has undergone a revolution: from a nuisance to be eliminated, to a new tool to be exploited."This section describes how this tool may be exploited by encoding information in the nonlinear Fourier spectrum (also often called the inverse scattering transform or IST) of a signal transmitted over an optical Pulse propagation over an optical link of standard single-mode fiber with ideal distributed Raman amplification is well modelled using the generalized NLSE [65].In normalized form (see [66]), with time t and distance z along the fiber expressed in dimensionless "soliton units", this equation is given as where j = √ −1, s ∈ {±1}, q(t, z) is the complex envelope of the signal, and n(t, z) is noise, usually modelled as a white Gaussian random process.The first term on the right-hand side expresses the effect on the transmitted waveform of chromatic dispersion, and the second term expresses the effect of Kerr nonlinearity.The equation does not include a loss term, as all losses are assumed to be ideally compensated by Raman amplification.When s = −1, this equation models signal propagation in the so-called "focusing" regime corresponding to anomalous dispersion (which supports the propagation of soliton pulses), while taking s = +1 gives propagation in the "defocusing" regime corresponding to normal dispersion.In the absence of noise, i.e., with n(t, z) = 0, Eq. ( 19) is referred to simply as the NLSE (without the word "generalized").
In their landmark paper [69], Zakharov and Shabat discovered a Lax pair (L, M) for the NLSE, thereby establishing its integrability.Fixing z and writing q(t) for q(t, z), the nonlinear Fourier transform (NFT) of the signal q(t) is defined in terms of the Zakharov-Shabat system where λ ∈ C is a spectral parameter-an eigenvalue of the L operator-and v(t, λ) is a corresponding 2 × 1 eigenfunction.Let u(t, λ) = [u 1 (t, λ), u 2 (t, λ)] T denote the solution of Eq. ( 20) under the boundary condition v(t, λ) → [1 0] T e −jλt as t → −∞.Define the spectral coefficients a(λ) and b(λ) as and let a (λ) = d dλ a(λ).Finally, denote the upper-half complex plane (i.e., the set of complex numbers with positive imaginary part) as C + , and let D = {λ ∈ C + : a(λ) = 0}.Since a(λ) is analytic in C + , the set D consists of isolated points [66]; furthermore D is finite when q has finite energy.The NFT of q(t) is the function Q : R ∪ D → C defined by Thus, unlike the ordinary Fourier transform, the NFT spectrum generally consists of two components: the continuous spectrum supported on R and the discrete spectrum supported on D. When D is empty, the discrete spectral function is absent.In the defocusing regime (when s = +1), D is necessarily empty.For small signal amplitudes, the continuous spectrum coincides with the ordinary Fourier transform of q(t), and D is empty.When present, the discrete spectrum corresponds to the so-called solitonic components of q(t).A nonzero signal with a zero continuous spectrum and a discrete spectrum supported on N points is referred to as an N-soliton.As noted in [66], the NFT shares many of the properties of the ordinary Fourier transform, including the generalized Parseval identity The solitonic signal components influence energy only via the location of the imaginary part of λ ∈ D, with larger imaginary part in direct proportion to larger energy.In effect, the NFT a reformulation of the so-called "scattering data" associated with the IST.
Restoring the z-dependence, Q(λ, z) denotes the nonlinear Fourier transform of the signal q(t, z).The signal q(t, 0) is applied at the channel input.Under mild assumptions (that q(t, 0) is absolutely integrable and decays to zero as |t| → ∞), an extremely simple relationship exists between Q(λ, 0) and Q(λ, z) at any point z, namely In other words, the NFT of the signal q(t, z) observed at distance z is obtained by multiplying the NFT of the input signal q(t, 0) by a nonlinear frequency response H(λ, z) = exp(4 jsλ 2 z).The analogy with linear time-invariant systems is immediate: the NFT plays the same role for systems defined by the NLSE that the ordinary Fourier transform plays for linear time-invariant systems.Note that multiplication by H(λ, z) preserves energy, since for real-valued λ, H(λ, z) corresponds to an all-pass filter that preserves the energy of the continuous spectral component, while for λ ∈ D, multiplication by H(λ, z) does not influence the location of Im(λ), which is all that determines the energy of the solitonic component.Energy-preservation is to be expected, since the NLSE models an ideal lossless (and noiseless) system.
An immediate application is an information transmission strategy that is the nonlinear analog of orthogonal frequency-division multiplexing (OFDM), termed nonlinear FDM (or NFDM), that encodes information in the nonlinear spectrum of the signal [68,70].Indeed, the idea of encoding information in just the discrete spectrum was first proposed in [88], with recent generalizations given in [73,89].A number of recent papers [71][72][73][74] have studied various aspects of NFT-based transmission strategies in both the focusing and non-focusing cases.Experimental demonstrations of NFDM schemes and conventional transmission schemes using NFT-based signal detection are described in [75][76][77][78][79][80][81].Numerical methods focused on fast algorithms are described in [82][83][84].
Of course, actual channels are noisy, and therefore are described by the generalized NLSE Eq. (19).The addition of noise as a forcing term corrupts integrability and the elegant NFT approach does not apply directly.In practice, however, the noise is small, and so can be treated as a perturbation.Depending on the approach taken, various noise models result [68,85,86].Bounds on the "per-soliton" capacity, which include the effects of noise, are provided in [87].
Recent results use numerical methods to estimate the spectral efficiencies that can be achieved using the NFT approach [94][95][96].In particular, [94] estimates achievable spectral efficiencies of approximately 10.7 bits per symbol in a 500 GHz bandwidth over a transmission distance of 2000 km in the focusing case (s = −1), while [95,96] estimates achievable rates in excess of 10.5 bits per complex degree-of-freedom at the same distance in both the defocusing case (s = +1) and the focusing case.In all three papers, the transmitted information is encoded in the continuous spectrum, fiber parameters are set to practically relevant values, and the transmission power is set to a large value, where the impact of nonlinearity would seriously degrade the performance of conventional transmission techniques.Provided that information is encoded only in the continuous spectrum, there is little difference, from the NFT perspective, between the defocusing and focusing cases, though the latter case does support soliton transmission as well.
Some papers on NFT-based information transmission have incorporated other channel models.It has been shown that the requirement of ideal distributed Raman amplification can be relaxed for modulation of the continuous spectrum [90,91].A "lossless path-averaged" (LPA) NLSE was used to deal with lumped amplification from EDFAs as well as non-flat Raman gain profiles.Another recent paper has extended eigenvalue modulation to the polarization multiplexed case [92].

Fig. 1 .
Fig. 1.Example of normalized Im[C m,n (L/2)] coefficients for 3600 km of standard single-mode fiber with RRC pulse shaping and SEDC.

Fig. 2 .
Fig. 2. (a) Dependence of the BER on the optical launch power for a single 128 Gbit/sPM-16QAM signal and a fiber length of 3600 km.(b) Dependence of the BER at optimum launch power on the fiber length.LC: linear post-compensation for dispersion; LC-SEDC: symmetric linear pre-and post-compensation for dispersion; RRC-SEDC: symmetric linear pre-and post-compensation for dispersion, root-raised-cosine pulse shaping, and perturbation-based pre-compensation.Experimental results originally published in[16].

Fig. 4 .
Fig. 4. DBP gain as a function of (a) NLI reduction and (b) normalized NLC bandwidth for

Fig. 5 .
Fig. 5. Normalized (a) real and (b) imaginary components of the 3rd order IVSTF kernel coefficients at three distinct angular frequencies inside a 256-samples FFT block (ω n = 1, ω n = 128 and ω n = 256).Vertical and horizontal axes correspond to the k and m indices in Eq. (13), respectively.The represented IVSTF inverts a single standard single mode fiber span, with signal transmission at 32 Gbaud and sampling rate of 64 GSa/s.

Fig. 6 .
Fig. 6.BER performance and maximum signal reach of 124.8 Gbit/s PM-64QAM enabled by CDE and IVSTF.(a) BER versus number of spans for different channel launch powers; (b) Maximum reach versus launch power.Experimental results originally published in [47].

Fig. 7 .
Fig. 7. Constellation diagrams for AWGN channel with SNR=25 dB and constellations, normalized to unit power.(a) Standard 256QAM, (b) geometrically shaped 256 polar (c) probabilistically shaped 256QAM with Maxwell-Boltzmann distribution.Different probability mass functions (PMFs) result in different scaling, and thereby different Euclidean distance.However, non-uniform PMFs result in reduced entropy H(X) and thus reduced maximum spectral efficiency.

Fig. 8 .
Fig. 8. Performance of probabilistically optimized QAM.(a) Even though 1024QAM with probabilistic shaping results in increased nonlinear distortion and thus reduced effective received SNR, (b) it achieves ≈ 0.2 bits/symbol of gain, or equivalently ≈ 300 km.