Ripple Distribution for Nonlinear Fibre-Optic Channels

Since Shannon proved that Gaussian distribution is the optimum for a linear channel with additive white Gaussian noise and he calculated the corresponding channel capacity, it remains the most applied distribution in optical communications while the capacity result is celebrated as the seminal linear Shannon limit. Yet, when it is applied in nonlinear channels (e.g. fiber-optics) it has been shown to be non-optimum, yielding the same result as for uncoded transmission in the high nonlinear regime. This has led to the notion of nonlinear Shannon limit, which predicts vanishing capacity at high nonlinearity. However, recent findings indicate that non-Gaussian distribution may lead to improved capacity estimations, urging for an exciting search for novel methods in nonlinear optical communications. Here for the first time, we show that it is possible to transmit information above the existing limits by using a novel probabilistic shaping of the input signal, which we call ripple distribution

Fiber-optic transmission is the back-bone of communication systems, yet its capacity -maximum error-free transmission rate -remains unknown due to intricate non-linear memory effects. As Shannon found optimum distribution for a linear additive white Gaussian noise channel to be Gaussian distribution [1], it is widely applied in nonlinear fiber-optic systems [2][3][4]. Since then, a number of coding schemes and modulation formats [5] have been proposed, creating a family of probabilistic shaping based on Gaussian distribution with the maximum increase of the order of 0.2 bits compared to uncoded transmission [6][7][8].
However, the Gaussian shaping assumption provides data rate estimates below the nonlinear threshold (also referred as "nonlinear Shannon limit") [9][10][11][12][13], which are also overly pessimistic, as it as been recently pointed out in [14]. This is because the derived models make averaging of the signal dynamics and lose information about inter-symbol interference effects. Other widely used practical approaches, the so-called "perturbative models with deterministic nonlinearity" [15][16][17][18], achieve a first-order perturbative solution of the nonlinear Schrödinger equation by taking into account only the signal-signal interactions in fiber transmission. Such type of nonlinear signal distortion is deterministic and it can be compensated, in principle, with some elaborate technical efforts [10]. Thus, the principal challenge is to provide an accurate analysis of the signal-noise interactions with signal-dependent statistics.
Here, for the first time we achieve data rates above the conventional limits by introducing a novel type of input signal distributions -ripple distributions. Moreover, the results demonstrate that monotonically increasing transmission rates can be achieved even in the high nonlinear regime. These findings are in sharp contrast to previous estimates based on the Gaussian distribution of input signal and break the notion of "capacity vanishing to zero at high signal power" thus establishing * Electronic address: m.sorokina@aston.ac.uk a new direction for coding and signal channel distortion compensation algorithms.
Finite Memory Discrete-Time Channel Model. A typical communication system is presented in Fig.  1a). At the transmitter, the message (uncoded bits) is modulated to a discrete time set of constellation symbols (here, 64-QAM plotted in Fig. 1b) and after pulse shaping it is mapped to a continuous time signal (Fig. 1c), which is subsequently launched to a multi-span optical fiber link. The propagation of the continuous-time signal E(t, z) in the optical fiber (Fig. 1d) is governed by the well known nonlinear Schrödinger equation (NLSE): where the deterministic distortions ( Fig. 1e) are introduced by fiber loss α, second-order dispersion parameter β 2 , and by Kerr nonlinearity characterized by the coefficient γ. The stochastic distortions are described by the zero-mean additive white Gaussian noise (AWGN) η(t, z) of variance η(z, t), η * (z ′ , t ′ ) = D L δ(z − z ′ )δ(t − t ′ ), with D and L being the noise spectral density and the transmission length, respectively.
A discrete-time channel model is crucial for information-theory based analysis as it enables optimization of the mutual information functional I(X, Y) for deriving the optimum signal distribution P (X) (Fig.  1f) and calculating the maximum reliable transmission rate, which is the channel capacity C = max I(X, Y). The transition between continuous-time modeling, given by Eq.1, to discrete-time modeling is not straightforward, since it requires expansion over a complete orthogonal set of basis functions {f k (t)}. This is equivalent to matched filter demodulation at the receiver for generating observable discrete-time variables {Y k } (Fig. 1g). At the transmitter, signal expansion over the carrier pulses is considered, that is where X k are the complex modulated symbols, f (t) is the time-varying pulse waveform and T is the symbol period. At the receiver, the signal undergoes matched filtering, dispersion compensation and sampling at t = kT : Y k (ξ) = P −1/2 dtD[E(t, ξ)]f (t − kT ) (with dimensionless coordinate ξ = z/L d and L d denotes the dispersion length), which allows the following discrete-time representation of NLSE: where the AWGN noise term η k is characterized by the where Ψ n (ξ) = ⌊ξ/Ls⌋ 2 e −αmod(ξ,Ls/Ld) is the noise power profile and ⌊x⌋ denotes floor function over variable x. Also Ψ s (ξ) = e −αmod(ξ,Ls/L d ) represents the signal power profile and L s is the span length. The coupling coefficients C mn define the memory behavior (within a memory window M ) of the transmission channel and depend on its physical properties and the signal pulseshape: The proposed channel model generalizes a number of previous results. Namely, the class of infinite-memory Gaussian noise models, reported in [2][3][4]19], can be received from Eq. 5 after averaging the nonlinear interference term |C mn | 2 S 3 , whereas the finite memory model of [14] can be received by averaging of the coupling matrix |C mn | 2 while keeping the information about interfering symbols. Thus, this is the first finite-memory discretetime model that: a) captures pulse-shape and format dependence; b) includes signal-signal and signal-noise interactions; and c) enables incorporation of higher-order terms and derivation of conditional pdfs, which are essential for accurate capacity estimations.
We also employ multiple-scale analysis over the two small parameters that characterize the main signal degradation effects, that is : a) nonlinearity ε = L d /L N L (with dispersion length L d = T 2 0 /β 2 and nonlinearity length L N L = 1/(γP )) and b) noise ρ = N/P = 1/ √ SN R (which is the reversed square root of the signal-to-noiseratio (SNR) in the corresponding linear system) with noise power N = DB and B denoting the signal bandwidth. In the main order we have a linear channel with the AWGN noise term ζ k characterized by the correlation ζ k , ζ * m = δ km . The discrete-time perturbative multivariate channel model has a tensor form: which can be rewritten as follows here, the first term describes the deterministic Kerreffect induced inter-symbol interference on the transmitted symbol X k in the k-th time slot. It can be calculated by solving the deterministic part of Eq. 2 and it can be compensated at the transmitter or the receiver [10]. By expanding over small parameter ε we can receive The coupling matrix C mn = dzΨ s (z)C mn (z) governs signal-signal interactions, particularly those which are responsible for the non-circular distribution of the distortion. This is achieved by taking also into account the pulse-shape impact. Its elements represent weights of interference between symbols in different time slots. To demonstrate the effect we considered the transmission of a single-channel in a dispersion unmanaged fiber link. For simplicity, we used Gaussian pulses of 10 ps full width at half maximum duration and baudrate of 28 GBaud. The link parameters were α = 0.2dB/km, β 2 = −20ps 2 /km, γ = 1.3 1/W/km, L s = 100km. The multi-span modeling of 1 st order approximation was verified numerically and experimentally [20][21][22]. Fig.  2 shows that the strength of interference between neighbouring symbols decays exponentially with their distance (denoted by m, n), whereas the slope is defined by the parameters of the transmission system. At small transmission distances Fig. 2a) we observe interference between the closest neighbours, which causes dominance of phase distortion (see constellation diagram below). As the distance increases, more symbols interact and we observe more circular clouds (Fig. 2b,c). The coupling matrix accurately identifies the interacting symbols and the non-uniform strength of their interference. This is an extremely important characteristic, since by preserving all relevant information about the signal interaction, it allows complete compensation at the transceiver. The latter cannot be achieved with a conventional approach that is based on averaging the signal statistics [2][3][4]19].
Signal-Signal Interactions. Firstly, in Fig. 3a), we calculate the achievable data rates on a nonlinear transmission system of 1000 km length (without any type of nonlinear compensation), using the existing GN-model [4] and compare them with our approach based on calculating the variance of nonlinear distortions in Eq. 5 N S−S = |X −Xe −iφ nl | 2 = 2S 3 m,n =0 |C mn | 2 (the limit on summation reflects compensation of the stationary phase shift φ nl = 2 |X| 2 m |C m0 | 2 ). We can see that our developed model converges to Gaussian noise model under infinite-memory approximation.
The deterministic distortions can be compensated with the traditional digital back propagation or pre-distortion methods, or alternatively, with the perturbation approach of our developed channel model. The perturbation over small parameter ε defines deterministic signal distortion Y (No) of order N o by the recurrence relation: The discrete-time characteristic of the approach makes possible the compensation of the nonlinear interference at the receiver without increasing the signal bandwidth. This is in contrast to the traditional pre-distortion techniques that are based on continuous-time waveform processing. After removing the deterministic nonlinear distortions, signal-noise interference becomes the main limitation for increasing the transmitted information rates.
Our proposed approach represents the first accurate discrete-time channel model with memory, which can uniquely capture such signal-noise beating effects.
Signal-Noise Interactions. The next two terms in Eq. 5 determine stochastic effects. If M k,m = δ 0,m and L k,m = 0 we receive a linear AWGN channel; in the next order over parameter ε signal-noise mixing effects are taken into account M k,m = ρδ k,m + ρε n K n,m−k ( X k+n X * m+n + X m+n X * k+n ), Moreover, the matrix model of Eq. 4 represents a general form that can be easily expanded to cover multiwavelength operation. In that case, Eq. 5 can be rewritten as follows: (9) here Greek and Latin letters denote frequency and time indexes respectively.

Capacity Lower Bounds and Conditional pdf.
To derive the capacity it is necessary to optimize the mutual information functional: with power constraint: dx||x|| 2 P x = dS. To optimize the mutual information over input pdf one must calculate the multivariate conditional pdf that takes into account the memory effects defined by the channel model of Eq. 4. Next, we derive the conditional pdf for the transmitted symbols. The channel model of Eq. 4 represents a mixing of signal and noise components as a result of the intersymbol interference. A linear combination of univariate independent and identically distributed normal vectors can be represented as a complex normal distribution [24] P (Y|X) = (π) −d (|Γ||P|) −1/2 (11) Notations: * means the complex conjugate, T means transposition, and H means transposition and complex conjugate. The covariance matrix Γ (real, symmetric, non-negative definite and Hermitian) and relation matrix Υ (real and symmetric) are given as This is the first result that presents the conditional pdf derived from a discrete-time model with memory that has been accurately defined by nonlinear properties of the fiber optic channel. This approach allows to capture any non-circular behaviour of signal distortions and it contains precise information about intersymbol and signal-noise interfering effect. In this particular case of Υ = 0 and E(Y) = 0 we have circularly Ripple distribution. Now let us consider a set of α = 1..q Gaussian distributions with uniform phase, each of which is localized around a different power level ρ 2 α and it has different variance S α and weight p α , so that in polar coordinates it is represented as: 14) here the Latin alphabet is used to denote the time-index, and the Greek, the coding level.
In the simplest case of q = 1 we receive the conventional result: which is the Gaussian pdf with zero mean and variance S in polar coordinates. The resulting lower bound is which is plotted in Fig.  3, where C nl = 6S 2 N m,n =0 |K mn | 2 , and is in agreement with the asymptotic closed form expression of [23].
In case of q = 2 and for S > C −1/2 nl , we consider a set of two Gaussian distributions (see Fig. 3b). One is centered around zero power level and it has a fixed variance and the proposed model (green)) and compensated deterministic distortions with account of signal-noise (S-N) interference. Previous lower bound (denoted by I0) decreases to zero, whereas ripple distribution as input pdf (Eq. 14) allows to achieve higher monotonically increasing bounds, denoted by I1 and I2 for the increased number of ripples in an input pdf as shown in panels b) and c) correspondingly (Gaussian pdf is shown in red dashed lines for comparison).
equal to the maximum peak power of the lower bound with single level S 1 = C −1/2 nl , whereas the second one will have the same variance and vanishingly small probability δ → 0 and it is centered around a distant power level ρ 2 ≫ S 1 . Thus, we consider p α=1,2 = {1 − δ, δ}, S α=1,2 = {C −1/2 nl , C −1/2 nl }, and ρ α=1,2 = {0, ρ}. Consequently, the corresponding lower bound on capacity for ρ 2 ≫ S 1 having optimized the free parameter δ results in the following analytical bound where the parameter ρ can be found from the power requirement: 2 πρ 2 C 1/2 nl ρ 2 e −ρ 2 C 1/2 nl = S − S 1 . This result shows that, with the considered suboptimal input pdf, we received a monotonically increasing lower capacity bound, which is asymptotically close to the plateau level. Therefore, we proved that signal-noise effects do not decrease capacity with the increase of signal power. For uncompensated signal-signal distributions without taking into account signal-noise interference a similar result was obtained [14]. Further optimization of the pdf can only improve this bound.
To prove this, we consider the ripple distribution for larger number of levels q and with adjustable variances and centers of each distribution. We have found that the numerical optimization results in a monotonically increasing I 2 numerical bound.
In conclusion, we derived the general finite memory multivariate channel model for describing nonlinear intersymbol interfering effects in fiber optic communication channels. The model predicts an exponential decay of the interference with the inter-symbol distance. For the first time we demonstrated that a lower bound on the channel capacity is a monotonically increasing function of signal power. We found an input signal ripple distribution that enables us to demonstrate this monotonically increasing lower bound on capacity. This provides an informationtheoretic tool for estimating the capacity limits of fiber channels, as well as, for the practical design of efficient compensation algorithms and coding schemes tailored by the nonlinearity. This work has been supported by the EPSRC project UNLOC EP/J017582/1.