Channel model and the achievable information rates of the optical nonlinear frequency division-multiplexed systems employing continuous b-modulation

: Following the rise in interest in transmission systems employing the nonlinear Fourier transform (NFT) for the nonlinearity mitigation, we present the theoretical analysis of the achievable information rates in these systems, addressing the case of continuous b -modulated systems. Using adiabatic perturbation theory and the asymptotic analysis by means of Riemann-Hilbert problem, we obtain a remarkably simple input-output relation for arbitrary b-modulated transmission. Based on this model, we estimated the spectral efficiency for various single polarization (scaled and unscaled) b-modulated systems and observed an excellent agreement between our theory and the numerical results in the regime when the inline amplifier noise is the dominant source of spectral distortion.


Introduction
The growing demand for the IP traffic volume, escalating due to emerging number of versatile novel on-line services, imposes progressively higher requirements on the throughput of contemporary core fiber communication systems. Due to the ever-increasing traffic demand, we are quickly approaching the limits of available fiber capacity provided by the current (mostly linear) transmission technologies [1][2][3]. However the nonlinear transmission effects arising largely due to the Kerr nonlinear media response in silica fibers are responsible for the capacity degradation at high signal-to-noise (SNR) levels [2][3][4][5], thus making it impossible to gain more line capacity by simply raising the signal power. To reduce the negative fiber nonlinearity impact, there have been proposed a plethora of nonlinearity mitigation techniques [6]. Among the existing methods, the nonlinear Fourier transform (NFT) [7,8] has recently attracted a lot of attention, since within this essentially nonlinear technique, the modulation and transmission effectively takes place inside the nonlinear Fourier (NF) domain, where the nonlinear cross-talk degrading the performance of conventional WDM systems is virtually absent. Such a transmission method employing the NF modes instead of linear counterparts is known as nonlinear frequency division multiplexing (NFDM). The idea of using the NF spectrum components for modulation and transmission was first proposed in [9], where only discrete nonlinear modes were used. Later on the use of continuous spectrum was suggested in [7,10] and a hybrid scheme using both discrete and continuous spectrum was developed in [11,12]. NFT has also been successfully generalized for two polarizations [13][14][15][16][17][18].
While having the potential to overcome the nonlinearity induced limits, the NFT-based methods still suffer from several limitations. For example, the coupling of NF modes still occurs in the real systems due to the deviation of the true transmission channel from the idealised model (i.e. from the unperturbed nonlinear Schrödinger equation (NLSE) containing only the effects of the second order dispersion and Kerr nonlinearity). The effects emerging in the multi-user NFDM environment compared to the WDM are analyzed in [19,20]. Inline amplifier noise is also not accounted for by the idealized model and the interaction of the signal evolving inside the NF domain with noise is an area that is still poorly studied despite some recent progress reported in [21][22][23][24]. In particular, there is lack of analytical estimates for the achievable information rates for the NFT-based systems despite some results for discrete [25,26] and continuous spectrum [21,23,27]. The main difficulty here is that even in the large SNR regime when one can use perturbation theory, the resulting channel law is strongly input-dependent, which makes capacity estimates notoriously difficult. However, better and more accurate analytical derivations for the channel law are useful as they could be used for estimates for the achievable information rates (AIR) with the mismatched decoding [28][29][30].
Recently, a specific type of the NFT operating with the continuous spectrum has come to vogue: namely, the so-called b-modulation [31]. In this scheme the data is encoded on the so-called second Jost coefficient b(ξ) [32,33]. The original suggestion [16,17,31,34] favoured b-modulation over the original r-modulation [10] as it allows, in principle, the generation of signals having a compact support, which helps improve spectral efficiency. At the same time, unlike the r-modulated system, the b-modulated system suffers from the so-called "energy barrier" [34] as the value of |b| is upper-bounded by unity. The possible ways of circumventing this limit is using smartly tailored nonlinear signal carriers [34]. Another proposition is to apply the nonlinear spectrum scaling which squeezes the absolute value of b coefficient into the allowed interval at the expense of losing the time-limitedness of the pulse [16,17].
In this paper we report two new results: i) the analytical model (channel-law) for b-modulated nonlinear spectra and ii) the capacity lower bound for these systems. We study both scaled and unscaled b-modulation systems. Some preliminary results of this paper regarding the channel model were reported in [35].

Nonlinear Schrödinger equation (NLSE)
The principal master model for the slow-varying envelope of electrical field, U(z, t), propagating down a single-mode optical fiber under the assumption of ideal distributed amplification (i.e. assuming the flat zero-level gain-loss profile) is written as a noise-perturbed NLSE (see [5,36]): with Z being the distance along the fiber, T is the time in the frame co-moving with the velocity of the envelope U. We restrict ourselves to the case of a single polarization only and the parameter β 2 is the characteristic of chromatic dispersion (β 2 <0 for the anomalous dispersion case considered in our study), γ is the nonlinear Kerr coefficient. Model (1) represents lossless NLSE perturbed by the zero-mean additive white Gaussian noise (AWGN) term N(z, t) characterized by the power spectral density (PSD) per unit of propagation distance, N ASE , and the autocorrelation function [5,36]: where the overbar stands for the complex conjugate, E[. . .] is the expectation value, δ(. . .) is the Dirac delta-function, and L is the signal propagation distance. The model above is exact in the case of ideal distributed Raman amplification (IDRA) and neglecting other effects such as the third order dispersion, Raman frequency shift and so on. For the IDRA, we have: N ASE /L = h ν s K T α, where α ≈ 0.2 dB/km is the fiber loss coefficient at the carrier wavelength λ s = 1.55 µm, K T ≈ 1 is the temperature-dependent factor that characterizes the Raman pump providing the gain, ν s = 193.55 THz is the carrier frequency of the signal.
However, to be more realistic and to describe most transmission experiments, we shall assume the model (1), and (2) to be a path-averaged limit of the lumped erbium doped fiber amplification (EDFA) transmission [37] (see also Ref. [38] for the study of applicable limits of path-averaged model in continuous NFDM systems). In this case the optical pulse is propagated through lossy spans each of length L a where at the end of each span the signal is regenerated via EDFA gain G = exp(α L a ). The resulting path-averaged model is the same as in the distributed Raman scheme but with the renormalized nonlinear coefficient γ eff = γ(G − 1)/(G ln G) [37,39], and the total accumulated noise ASE is then given by [5]: N ASE = (G − 1) N a h ν s n sp , where N a is the total number of spans (L = N a L a ), and n sp <1 is the spontaneous emission factor commonly evaluated via amplifier noise figure: [40]. In this case the resulting averaged model (1) represents an approximation valid when the amplifier span is much less than the typical scale of variation of the pulse. Note that although the analytical NFT analysis below assumes the distributed model, the numerical verification of the results was obtained by simulating the lumped amplification link, taking into account periodic power variations and noise injection from span to span. Now let us introduce the normalized units, with The resulting normalized NLSE becomes: The noise term n(z, t) has been normalized in accordance to (3), so that it is delta-correlated with the intensity: 2D = N ASE Z s /(P s T s L) [21].

NFT operations
The NFT decomposes the signal q(z, t) at a fixed location z = z 0 and returns the corresponding NF spectrum. The reciprocal operation, the inverse NFT (INFT), returns the signal q(z 0 , t) taking the full NFT spectrum as an input. In what follows, we shall consider the case of the signal vanishing at infinity and anomalous dispersion fiber.
Direct NFT operation: The direct NFT is computed from specific (auxiliary) solutions v 1,2 (t, ζ) = v 1,2 (t, ζ; z 0 ) (in the future we shall omit the z 0 notation for brevity) to the linear Zakharov-Shabat problem (ZSP), which we write down as [33]: for different values of the complex spectral parameter ζ = ξ + iη. The overbar in (5) and below stands for the complex conjugate. Under the assumption that ∫ ∞ −∞ |q(t)| dt<∞, specific solutions (the so-called Jost functions) ϕ 1,2 (t, ξ) and ψ 1,2 (t, ξ), for real ξ, to the ZSP can be obtained from the boundary conditions: Furthermore, we will also need the expression for the conjugated pairs:φ 1 =φ 2 andφ 2 = −φ 1 as well asψ 1 =ψ 2 andψ 2 = −ψ 1 . These also solve ZSP as expressed in (5). In practical realization of the transmission schemes, the pulse q(t) is truncated to have a finite duration, i.e. we operate in the so-called burst mode [41], and thus one sets the initial conditions at the trailing and leading ends of the finite-extent pulse. The functions a(ξ) and b(ξ) forming the core of the NFT signal decomposition, also known as the Jost scattering coefficients are defined through the Jost solution as: These scattering coefficients satisfy: |a(ξ)| 2 + |b(ξ)| 2 = 1. The NF spectrum of the signal q(t) consists of two parts. The first (continuous) part is given by the right reflection coefficient (RC), The second part of the NF spectrum consists of the discrete NF spectrum responsible for the soliton component of the pulse. However, we do not consider the solitonic degrees of freedom in our study. The z-dependence of the RC and scattering coefficients is as follows: INFT operation. The INFT maps the scattering data onto the field q(t). This is achieved via the solution of the Gelfand-Levitan-Marchanko equations (GLME) [33,42], or, alternatively, by means of the Riemann-Hilbert problem [43]. However, we do not present here the explicit expressions as these are not necessary for our further exposition (safe for long distance asymptotes in the Appendix).

r and b-modulation
In all general NFT-based schemes similarly to the conventional modulated systems, one starts with a sequence of symbols c k chosen from a modulation constellation of cardinality M. These symbols, however, are used to modulate the nonlinear spectrum (at the transmitter, i.e. at z = 0): where ψ(ξ) is the chosen nonlinear carrier spectral shape, ∆ξ is the nonlinear carrier spacing, and S is the effective parameter that defines an average power per carrier (the constellation points are supposed to have standard dimensionless spacing). Often one defines a time scale T 0 = π T s /∆ξ in the real word units, which can be loosely defined as NFDM symbol duration [16,17]. This definition must, however, be handled with caution as it is known [10] that the nonlinear spectral spacing defines the signal support in the time domain only in the low power limit when the nonlinear spectrum becomes (up to a factor of 2 frequency scaling and complex conjugation) identical to its linear counterpart [42]. As for the carrier shapes, the most popular is the sinc-shape chosen in analogy to linear orthogonal frequency division multiplexing (OFDM) and providing a good localization of resulting profiles in time domain, but other carrier forms can also be employed, see below. Therefore, we shall carry out most of our theoretical analysis for the general carrier shape and will only fix it in the numerical simulations when the specific choice is required. Note that in all implementations of the NFT, the carrier is usually oversampled both in the nonlinear and time domain. The idea of the NFT-based methods is to take the spectral shape (10) (amplitude-scaled, if desired, by a local transformation) and use it either as the reflection coefficient r(ξ) or as a b-coefficient. The former technique is commonly known as the r-modulation (or the nonlinear inverse synthesis [10]), and the latter technique as the b-modulation. Note that r, a and b coefficients are not independent. They are related to each other by the following relations (assuming continuous spectrum only) [7,32]: where H {. . .} denotes the Hilbert transform. It follows from the first expression in (11) that the absolute value of the b-coefficient is bounded by unity: |b(ξ)|<1. Therefore, the original unscaled b-modulation suggested in [44], while providing an attractive property of localized signal in the time domain (and hence high spectral efficiency at the transmitter), suffered from the "energy barrier" issue -a limitation on the input power S in (10) imposed by the above constraint [44,45]. This limitation was partially circumvented in [34] by using cleverly tailored flat-top carrier shapes and optimized constellations. The authors in [16,17] have chosen another path by imposing no limit on the initial power factor S in (10) but, instead, providing a nonlinear squeezing transformation: |b| = f (|u|) and in the latter case the exact localization property in the time domain was lost, but instead one earns the freedom in signal power up-tuning. It was argued in [16,17] that most of the spectral efficiency in the NFT communication is lost due to the need to insert large guard bands between the bursts [8] and not due to the initial pulse shape within the burst so such a loss of compactness can be tolerated. The specific choice of the exponential function to the best of our knowledge is not dictated by any particular performance gains, but was chosen in [16,17] to provide convenient expression for the total energy of the NFT burst, E b , via the nonlinear version of the Parseval identity [7,8]: the exponential scaling yields the expression for E b , which is formally identical to the energy of a pulse sequence with the linear spectrum given by Eq. (10). In Table 1 below we summarise some popular choices of scaling used in recent literature. Note that there is still no agreed standard notation in the NFT area and some authors, e.g., [7], use q instead of r to denote the continuous spectrum. Table 1. Some of the scaling functions used in NFT-based transmission systems, with u(ξ) being the information-bearing modulated function of type (10).

Modulation
Scaling Ref [10,21,22,24,46] r(ξ) = (e |u(ξ )/2| 2 − 1) 1/2 e i∠u(ξ ) [19,20] b [16,17] The setup of a typical NFDM system using either r or b modulation is shown in Fig. 1. Our data are encoded into the spectral waveform according to Eq. (10), after which an amplitude scaling transform f is applied and the received spectral shape is associated either with r or b component of the nonlinear spectrum. After that, we apply precompensation to the initial encoded spectral shape inside the NF domain, i.e. we unroll the half of accumulated phase in the nonlinear domain (9). This is done to reduce the required guard band of each burst and, thus, increase the spectral efficiency [47]. After the precompensation, the INFT is performed (we use the fast NFT implementation from [48]) and the received time domain signal after inserting the corresponding guard band, is launched into a single mode optical fiber modelled by Eq. (4). For the simulations, we used a standard split-step propagation algorithm and the fiber parameters β 2 = −21 ps 2 /km, γ = 1.27, W −1 km −1 , α = 0.2 dB/km. The noise was added in a lumped fashion as described in Section 2.1. At the receiver the above operations are performed in the reversed order, with the postcompensation equalizing the remaining half of the propagation effects, and the received spectral sequence u r (ξ) is demodulated according to the chosen carrier shape.
The presence of the noise breaks the exact integrability of the NLSE and makes the full equalization of the nonlinear spectrum impossible. In addition, the numerical errors of both forward and inverse NFT as well as suboptimal processing window and inter-burst interference, all make an independent contribution to the NF spectrum distortion [24]. There are currently plenty of numerical studies of the performance of NFT systems for both r and b modulation case including the achievable information rates (AIR) for these channels [16], but the analytical models describing the properties of the amplifier noise in the NS domain are quite few and far between, see Refs. [21,23,27] for the examples related to r-modulation. It was, however, recently reported [16] that the b-modulated systems demonstrate higher tolerance to noise as well as higher achievable spectral efficiency compared to r-modulation. It is, therefore, highly desirable to develop an analytical channel model for the b-modulated NFDM systems that could serve two goals: i) provide a possible explanation for the bit error rate (BER) improvement, and ii) give a channel law that could be used to provide capacity estimates via AIR in such systems. In the following section, we build such a model using the perturbation theory for b-coefficient and the asymptotic expression for the Jost functions ϕ 1,2 and ψ 1,2 taken at large enough propagation distances.

Analytical model for b-modulated NFT channel
In this section, we examine noise characteristics for the b-modulated transmission system. We build on our previous approach for r-modulation [21], where the perturbation theory of Kaup and Newell [49,50] was applied to the amplifier noise in Eq. (4) in conjunction with the long-distance asymptotes for the Jost functions [51]. Here, our goal will be similar but we reformulate the approach for the b-modulation. Define by b in (ξ) the modulated sequence (10) after possible prescaling of the amplitude but prior-to precompensation. The pre-compensated b-modulated input is given by specifies the amount of precompensation: when s = 0, no precompensation is applied and all compensation is performed at the receiver, while s = 1 implies full precompensation. In our simulation we shall use the most common choice, and split the precompensation half-way by setting s = 1/2, in order to optimize spectral efficiency. After the postcompensation the output b-coefficient is given , from which u out can be restored via the inverse amplitude scaling, if needed.
First, we will demonstrate that in the small-noise limit, the statistics of the output b-coefficient b out conditioned on the input b in are Gaussian but with input dependent covariance. Next, we will show that at long enough distances, one can get relatively simple analytical expressions for these correlation functions.
According to the adiabatic perturbation theory [49,50], the perturbed z-evolution of the a and b coefficient is described by the following set of equations: where I [u, v] is the projection of the perturbation (in our case of the AWGN noise n(z, t)) on the corresponding unperturbed squared Jost functions of the unperturbed system: The system (12) is incomplete unless one specifies the z-dependence of Jost functions. Since our treatment is a perturbative one, it is sufficient to evaluate ψ(ζ, t; z) and ϕ(ζ, t; z) in the zeroth order. The important consequence is that since the perturbation terms in the r.h.s. of Eqs. (12) are linear with respect to noise term, the effective noise in NF domain is Gaussian albeit its correlation properties depend on the input data and their unperturbed evolution.
Since in this work we concentrate on b-modulation, the second equation from Eqs. (12) allows us to define the noise term in the NFT domain N (ξ, b in (ξ)) according to the following expression: In the first order of the perturbation theory, given the input b(ξ, 0), the noise term N is zero mean complex Gaussian and is completely characterised by its (conditional) covariance and pseudocovariance: depending on the normalized distance l = L/Z s and noise intensity D.
The functions A and B are built on the time integral of the products of the Jost functions of the unperturbed system: The Jost functions entering the expressions above generally are not available in the closed form. However, when the propagation distances are large, one can obtain them in an asymptotic form that was first given in [51] and later formalized rigorously by using Riemann-Hilbert technique [52,53]. We note here that the usage of asymptotic expressions for Jost functions ϕ i , ψ i , and so on, in Eqs. (16) and, subsequently, in (14) and (15), is based i) on the assumption that they are reached with the sufficient accuracy at the distances much smaller than the propagation distance L, and this is proven via numerical simulations, and ii) on the cancellations of fast integrand oscillations in integrals entering Eqs. (14) and (15), such that we can keep only the leading term in the resulting approximate expressions. The details of the Jost asymptotics derivation are somewhat technical and can be found in the Appendix. Briefly, finding the Jost functions asymtotics relies on the fact that the off-diagonal elements of jump matrix for the Riemann-Hilbert problem are fast oscillating in the large distance limit, allowing one to evaluate the Jost functions using the nonlinear steepest descent method. The technique can be reckoned as the nonlinear analogue to the evaluation of contour integrals containing oscillatory integrands by the conventional (linear) steepest descent method. A more physically intuitive reasoning given in [51] (see also [54]) is that any solution of the NLSE model evolves asymptotically in a quasilinear way and is centered at the resonant line t = −2zξ. This observation, in turn, can be used to obtain a simplified asymptotic solution to the direct ZSP.
Regardless of the method chosen for finding the Jost function asymptotics, the end result of the calculation is remarkably simple. It turns out that in the large L limit the local covariance is diagonal and symmetric: In the above, the functions B 1,2 (ξ) represent the power spectral density (PSD) and pseudodensity normalized to power spectra density of the linear problem (expressed in normalized units) n ASE = 2πD l [5], while recalling that in the linear limit, the linear angular frequency is equal to twice the nonlinear case, so that ω = 2ξ and |a(ξ)| → 1 [42]. Note that the n ASE scales linearly with the normalized propagation distance, l, i.e. the noise accumulates along the link.
The expressions in Eqs. (17) are the main analytical result of the paper and according to it, the ASE-induced contribution to the signal distortion diminishes as the input approaches the energy barrier: |b(ξ)| → 1. This is in stark contrast with the r-modulated systems, where it was shown [21,22] that as the energy of the burst grows so does the PSD of the noise in the NFT domain. However, as we shall presently see, simply pushing the b-modulated signal to the energy barrier does not necessarily improve performance insofar as, apart from the ASE, there are other factors contributing to the nonlinear spectrum distortion. This is studied in the numerical simulations described in the next section.

Numerical verification of the properties of the nonlinear noise
Throughout the paper we will use the (approximate) burst duration at the transmitter as the time normalization parameter: T S = T 0 . Note that T 0 differs from the actual time duration of the pulse because of i) precompensation and ii) the nonlinear nature of the INFT, which makes the duration of any time domain pulse generated by inverse synthesis from a given nonlinear spectrum difficult to control [10]. Only in the limit of the linear system without precompensation does T 0 actually represent the input burst duration.
We start with the validation of the theoretical predictions (17). To this end, we consider an NFDM input sequence (10) with N sc = 128, 16-QAM modulated subcarriers. We have considered both scaled and un-scaled b-modulation setups and no precompensation was employed for the noise study, i.e. s = 0.
First, we used Eq. (13) to extract the current realization of the NFT noise. Then we computed the average PSDB 1 and the absolute value of its average pseudo-covariance counterpart, |B 2 |, by averaging: i) over 600 realization of the input symbols and inline noise and ii) over the nonlinear spectrum. The procedure is similar to that used in [24] for the r-modulation. The results obtained by such a procedure are presented in Figs. 2, 3, panes (a) and (b). The input power was P in = −6.8 dBm, corresponding to S = 0.1, in the scaled case, and P in = −5.55 dBm (corresponding to S = 0.0544) in the unscaled case. According to the theoretical predictions (17), the covariance matrix is diagonal while the pseudocovariance should vanish. And, indeed, one can see that the covariance matrix averaged over input symbols exhibits strong diagonal correlations, while the pseudo-covariance is at least an order of magnitude smaller as predicted by our theory. Therefore, in the following, we will ignore the pseudocovariance and assume the noise in the nonlinear domain to be symmetric.  was averaged over the input symbols and the nonlinear bandwidth. The covariance values in these figures are normalized to the linear value B 1,lin = n ASE = 2π D l. The results were compared to the theoretical prediction (18), where the averaging was performed analytically for the scaled and the unscaled form of b-modulation, in which we used different randomly generated input sequences (10). For the unscaled RRC-shaped carriers, the averaging can be carried out explicitly using the fact that different subcarriers do not overlap in the nonlinear domain. For the scaled case, the averaging amounts to averaging the exponential exp(−|u in (ξ)| 2 ), which cannot be carried out explicitly. However, for a large number of subcarriers, the central limit theorem applied to i.i.d. input symbols c k allows us to substitute u in (ξ) with an effective zero-mean complex symmetric Gaussian process, where the variance can be easily obtained. Combining both results, we get the theoretical prediction: where σ 2 k = E[|c k | 2 ] = 2(M − 1)/3 for M-QAM. Note that for the unscaled b-modulation, we have a natural upper bound for the power, S max , corresponding to the requirement |b in (ξ)|<1. We also notice here that we have a small mismatch between the theory and numerical results in Figs. 2(c) and 3(c), especially visible for the former. This is because the exact theoretical averaging over the input symbols in Eq. (18) can be obtained only in the limit of infinite number of subcarriers when the central limit theorem applies. Note, however, that even for the exponentially scaled case of Figs. 2(c), where the central limit theorem is less applicable, the maximum discrepancy with the compensated PSD is about 5%, while for the unscaled case of Fig. 3(c), they are almost indistinguishable.
The dimensionless parameter S is related to the real-world unit power via the nonlinear Parseval identity for the burst energy [7,55]: where the burst duration is estimated by the quasilinear dispersive spread of the pulse: where B is the nonlinear bandwidth of a single subcarrier and ε ∼ 1 is carrierdependent form factor usually determined empirically by considering a noise-free transmission. As some of us have shown in the previous publication [24], only part of the signal distortion in (13) is due to the ASE noise. Another contribution comes from the processing noise, which arises due to the finite accuracy of the forward/inverse NFT routines and is expected to become more and more prominent as the energy of the burst grows (i.e. |b in | → 1). And, indeed, as seen in Figs. 2, 3(c), (d), when the input power increases, the PSD of the noise initially goes down as predicted by the model (17) but then begins to rise. To separate the ASE contribution from the processing noise we followed the procedure of Ref. [24] and subtracted the results of separate runs obtained with the ASE noise switched off from the full PSD. The original, uncompensated results are labelled as "Full model" in the graphs while the PSD graphs obtained after the subtraction of the processing noise are labeled as "Compensated". As one can readily notice, this compensated model corresponding to the isolated contribution of the ASE noise only is fitted extremely well by our theory. The remaining processing noise that becomes relevant at high input powers near the energy barrier, currently lacks an analytical description. While it will be beneficial to include its contribution (if only phenomenologically) into the model (17), we leave this to further studies. Moreover, the properties of the processing noise depend on the particular numerical (I)NFT method used, and so it required a specific discussion in the context of the features and accuracy of particular NFT routines.

Input-output channel model
Let us study the implications on the derived approximate analytical model (17). The input-output relation (13) (after pre-and post-compensation) can be written in the following form: where we have introduced the signal-independent linear noise N(ξ), and E in the quasilinear limit |b in | → 0, one recovers an AWGN channel in the (linear) spectral domain.
In the case of unscaled transmission, the mapping b(ξ) = u(ξ) and Eq. (20) already defines a nonlinear spectrum waveform channel. For the scaled transmission, we shall assume an arbitrary real, differentiable, invertible scaling function f (x) such as 0 ≤ f (x) ≤ 1. In Refs. [16,17], the choice was f (x) = (1 − exp(−x 2 )) 1/2 , but we shall keep the scaling function general for now. To get an analytical expressions for the symbol statistics, we will need to assume that |b in | ≫ |N (ξ)|, so that the second term in (20) can be treated perturbatively. For the first order in the perturbation term, the amplitude and phase can be expressed as: Combining the two results together to the same first order accuracy, one can obtain a linearized input-output relation: The linearization procedure used to obtain the input-output relation (22) imposes more severe restrictions than the original pertubative model. Indeed, one can see that if we choose larger input power (controlled by the scaling parameter S in expression (10)), this will make |u in | tend to +∞ and, thus, 1 − f 2 (|u in |) will tend to zero. So the phase fluctuations in the relations (21) will be squeezed to zero. The same cannot be said, however, of the amplitude fluctuations because of the derivative term in the denominator. Case in point is the exponential squeezing factor of Refs. [16,17], for which the pre-factor in the amplitude relation for |u| is given by (exp(x 2 ) − 1) 1/2 /x, x = |u in |, and is exponentially diverging. This is no surprise since any monotonic upper bounded function converges to an asymptote (which is 1 in the b-modulated case). Therefore, even small changes in the values of function |b| = f (|u|), as prescribed by our theory, Eq. (20), will induce large changes in the argument |u|, which makes the simplified model (22) inapplicable. It is, however, very difficult (if possible at all) to obtain the auxiliary channel law analytically without resorting to linearization. We will, therefore, use model (22) bearing in mind that it is strictly applicable if the input b-coefficient is not too close to unity (which renders linearisation inapplicable) or zero (which violates the validity of the perturbation theory result (25)). The unscaled modulation scheme, which modulates directly the b-coefficient, is free from the former limitation: it follows from Eq. (25) that it makes sense to operate as close to the "energy barrier" |b in | = 1 as possible inasmuch as the noise term gets effectively reduced. This fact was noticed in our preliminary report [35]. Now let us apply the channel model obtained in this section to the capacity estimates for the b-modulated NFDM systems.

Achievable information rates
Finding Shannon capacity of the NFDM-modulated optical communication channel (or of the general nonlinear fiber channel for that matter) remains an unsolved problem. The capacity is defined as a maximum of mutual information (MI) defined below, over the probability distribution of the vector of symbols forming the NFDM burst subject to (average) input power constraint, see [5]. Since the exact solution for capacity is only available for very simple channels (e.g. for the AWGN channel), then one can only work with various lower bounds obtained by evaluating the MI either analytically or numerically for a given (suboptimal) input distribution P x (x). For the r-modulated system, the first capacity estimates were obtained in Ref. [21] for continuous input alphabet representing complex values of the initial reflection coefficient: However, the genuine channel law for the NFDM systems is still unknown, and the models like (25), (28) developed here for b-modulated systems, or those derived in Ref. [21,23] for r-modulated systems, still render at best just some approximations of the genuine channel. Therefore, an alternative approach has recently gained popularity: the technique relying on the AIR and the concept of mismatched decoding [28][29][30].
Let us begin with some definitions. At the transmitter, the bit stream of k bit forming a message is mapped onto a complex vector of N sc symbols: ⃗ X = {c 1 , . . . , c N sc }, where each symbol c k is drawn from a modulation constellation χ M of cardinality M. These symbols are then scaled by the power-tuning factor S 1/2 and mapped onto the nonlinear spectrum via relation (10) and, possibly, further scaling transformations. Our channel is defined as discrete input -complex output (see e.g. [5]), where the output to the input sequence ⃗ X is the continuous complex vector ⃗ Y ∈ C N sc . For a general communication channel with memory, the MI is defined as [5,29]: where P( ⃗ Y | ⃗ X) is the channel law with memory, P( ⃗ Y) is the distribution of the output complex vector induced by the discrete input distribution, and the averaging is performed over the joint input-output distribution P( ⃗ X, ⃗ Y). The rigorous definition of MI implies taking the limit N sc → ∞, but in practical settings the number of subcarriers is always limited. The capacity of the channel, C, is defined as supremum of the MI over all input distributions P( ⃗ X) subject to the fixed average input burst power, P in . The AIR is providing a lower bound for the channel capacity by lower-bounding the MI of a given channel for a given input. Specifically, we shall assume that our symbols, c k , are i.i.d. uniformly drawn from the constellation χ M . As for the channel law, we substitute the (unknown) genuine channel law P( ⃗ Y | ⃗ X) and the induced output PDF P( ⃗ Y) with the auxiliary ones,P( ⃗ Y | ⃗ X) andP( ⃗ Y) based on our approximate analytical model, Eqs. (25) and (28). The averaging in (30), nonetheless, is still performed over the genuine channel by averaging over the large number of the Monte-Carlo runs of different input bursts for a given input power and collecting the resulting vector of complex outputs, ⃗ Y. The procedure described defines the AIR, and it can be proven that it provides a lower bound for the MI of a genuine channel and, hence, for the capacity [28].

Scaled b-modulated transmission
The auxiliary channel law,P( ⃗ Y | ⃗ X), depends on the particular type of the b-modulated scheme. We begin with the scaled version and the (approximate) input-output relation (28) for the complex spectral wave-form. We shall assume as in the previous section, the nonlinear version of the OFDM scheme where the nonlinear spectral carriers in Eq. (10) are sincs. In the linear OFDM, the demodulation of the sampled time signal at the receiver is performed by means of a discrete Fourier transform. Although through the numerical NFT algorithm at the receiver one already obtains the nonlinear spectral form u out the removal of guard band and oversampling necessitates the additional linear DFT transform pair (see [16]). Mathematically, this is equivalent to the "matched filtering" albeit in the nonlinear spectral domain, so that the output complex amplitudes forming the output vector ⃗ Y = {Y 1 , . . . , Y N sc } are obtained via the orthogonal projection: with t 0 = T 0 /T s = 1 being the useful symbol duration in the normalized units. Then, from (10) and (28) it follows that our communication channel conditioned on the input, is input-dependent Gaussian with memory. Moreover, the noise is no longer circularly symmetric and is characterised by both covariance and pseudo-covariance matrices, Σ k k ′ andΣ k k ′ , correspondingly, conditioned on the input ⃗ X (below CN {︃ . . .

}︃
stands for the multivariate complex normal distribution): From the above, one can see that as the distance between the symbols k and k ′ grows, the memory fades due to the diminishing overlap between the sincs. To make the results more tractable, we shall neglect this overlap altogether here and assume the contribution of the diagonal Furthermore, in the same approximation we can substitute |u in (ξ)| ≈ √ S |X k | |sinc(ξt 0 /π − k)| in all the integrals for the covariance and pseudo-covariance and obtain the memoryless channel with the variance and pseudovariance of each transmitted symbol X k depending on this symbol alone: Note that for a given power value S these values can be precomputed for all constellation points X k ∈ χ M and then used in a look-up fashion in the numerical simulations.
In the same approximation for the i.i.d. input ⃗ X, the AIR (30) simplifies to where the averaging is performed over multiple generated input-output pairs {X k , Y k } obtained by simulating the real channel. The above results are approximate inasmuch as they neglect the channel memory due to the small overlap between the distant sinc carriers. A better channel matching (and hence tighter capacity bounds) can be obtained by considering the full model (32), but this makes the estimation task more challenging numerically and so we leave it to further studies.

Unscaled b-modulated channel
In this subsection we provide the results for the unscaled b-modulation transmission considered in [31,55]. In this scheme b in (ξ) = u in (ξ) given by (10) with the spectral carrier shape ψ ξ taken to be a spectrally flat with well defined bandwidth B = ∆ξ and possibly a roll-off factor. We are assuming that the carriers are real functions normalized to have unit peak, and the matched filtering sampling corresponds to the output sequence: where A ψ ∼ B is the effective area under the carrier pulse. Note that unlike the case of OFDMrelated sinc-shaped carriers considered in the previous subsection, the individual subcarriers here are not orthogonal as such and, therefore, the effective demodulation is achieved by demanding the negligible overlap between them. The orthogonal multiplexing with overlapping subcarriers in the unscaled b-modulation is problematic because the power scaling factor S must be adjusted for each random burst according to the requirement that sup ξ |b in (ξ)|<1, which is difficult to enforce [34,55]. But for the non-overlapping spectrally flat subcarriers of unit height, one must simply ensure that the power-control parameter satisfies: S<S max = 1/max k |X k | 2 , i.e. it is upper-bounded by the reciprocal maximum energy of a given constellation. As for the spectral shape of each non-overlapping carrier, one of the simplest choices that we adopt here is to use that of the root raised cosine (RRC) filter: with the roll-off factor 0<β<1 and the effective ξ-bandwidth is given by: The area under the squared subcarrier is: A ψ = π/t 0 . The RRC subcarriers, however, have a non-localized linear inverse FT and, therefore, relatively large tails in the time domain. We note that most of the results in this subsection are quite general as long as the subcarriers are non-overlapping. Regardless of the carrier shape used it follows that our auxiliary communication channel conditioned on the input is again input-dependent Gaussian. Nonetheless, unlike the case of nonlinear OFDM subcarriers considered in the previous subsection, the channel now is memoryless because different subcarriers in (10) do not overlap. The input-output relation for this channel is quite simple: This corresponds to the circularly symmetric complex Gaussian process with the mean X k and the following variance: In the case of RRC carriers, the integration in (29) can be carried out analytically yielding The achievable information rate can now be estimated from (26) setting the pseudo-covariance to zero.

Simulation results
In this section we analyze the performance of both scaled and unscaled b-modulated systems and compare the results of the simulations with the predictions of the theory given above. In all the simulations the 12 spans of 80 km were considered and our main quantity of interest was the spectral efficiency (SE) defined as the AIR per unit bandwidth per unit time. In our simulations, as it is common for almost all current NFT-based systems, the transmission is performed in a burst mode with the width of the burst, T b , usually estimated via the linear spread of the pulse. The total linear bandwidth W = N sc /T 0 was fixed to be 56 GHz for the scaled b-modulation and to the half of this amount for the unscaled one. The number of subcarriers was a free parameter that was optimized to achieve optimal performance as was done, e.g., in [16]. The fiber channel parameters, amplification scheme, the noise figure are the same as detailed in Section 2.1. The same two types of subcarriers as in the previous section were used (sincs for scaled, RRCs for unscaled b-modulation). To achieve maximal spectral efficiency half-way pre-compensation, s = 1/2, was used [47]. We used 32-QAM constellations throughout and the signal was demodulated according to the prescribed rules (24) and (27) respectively. Also a constant phase offset of M-QAM constellation was compensated using the first burst in the simulation as a pilot.
Following Ref. [16], we introduce an SE loss parameter η, which determines how "wasteful" the guard band is with respect to the information content of the pulse. Assuming that a single burst carries N sc modulated symbols each containing log 2 M bits over the total linear bandwidth W, the connection between the SE and the AIR (26) is: where η = (T b /T 0 )(1 + β), and the roll-off factor β should be put to zero for the scaled sinc-carrier modulation. For the unscaled RRC transmission, we used a range of values between β = 0.1 and β = 0.15. In the last equality in (30), we have made one simplifying assumption: namely, we assume that the linear bandwidth W does not change significantly during the propagation and can be approximated by its nonlinear counterpart (i.e. W = N sc /T 0 for the scaled b-modulation based on sincs or W = N sc (1 + β)/T 0 for the unscaled RRC). This is strictly true only for moderate input powers. However, we have checked that in all cases for the optimal launch powers, the average energy contained outside the linear bandwidth did not exceed 0.17% for the scaled case and 0.02% for the un-scaled one.
The expression in (30) has a few interesting consequences. Firstly, the AIR is upper bounded by the entropy of the discrete input, i.e. log 2 M and hence the SE cannot exceed log 2 M/η bits/s/Hz. Since the numerator is bounded, increasing the guard band η eventually degrades the SE. On the other hand, as η decreases one starts to clip the tails of the burst, which introduces the additional numerical error and diminishes AIR. It is theoretically possible to have a non-zero limit of SE as η → 0 -the informational content diminishes to zero but it is spread over a vanishing window -so the "efficiency" remains finite. This result is clearly meaningless since no reliable communication is possible in this limit. We therefore, need to define the area of the parameters where the expression in (30) still makes sense. For this expression, we chose the value of η, where the corresponding BER can be made lower than the 7% hard decision FEC threshold: BER FEC = 4.5 × 10 −3 [56]. Note that this criterion is somewhat arbitrary and we did not implement the actual FEC in our setup, but used this value as the benchmark of the tolerable quality for the transmission. In our simulations the BER was estimated using standard techniques of error vector magnitude evaluation [57]. It was subsequently translated into the Q-factor using the standard relation: Q = 20 log ]︃ . We also estimated BER via direct error counting and found a very good agreement below FEC threshold (low Q-factor, high BER) while above the threshold the BER was too low and the direct counting data were too noisy to be used. The Q-factors for both scaled and unscaled b-modulated systems are shown in Figs. 4(a) and 5(a) respectively. Quite unsurprisingly, the larger values of η parameter translate into the larger guard bands T b and, as a result, in the larger Q-factor values. The HD-FEC threshold is shown as solid horizontal line there. The minimum value of η above HD-FEC was found to be 1.3 in the scaled case and 1.61 in the un-scaled b-modulation. The latter value is higher, which is non-surprising as the RRC spectra after the INFT render a poorly localized time-domain shapes and suffer from numerical error resulting from the burst truncation. The number of subcarriers N sc for both schemes varied between N sc = 210 and N sc = 1024, and the reported curves represent the results optimised over N sc . The maximum achieved Baud rates N sc /T b were 43 Gbaud (η = 1.3, N sc = 700) for the scaled b-modulation and 20 Gbaud (η = 1.61, N sc = 322) for the unscaled case, corresponding to the bit rates of 215 Gb/s and 100 Gb/s, respectively. To demonstrate that the performance deterioration at high input powers is due to the processing noise discussed in Section 4 and not ASE, we have also included the results of the noise-free runs with the same parameters. One can see from Figs. 4(a), 5(a) that this is indeed the case as both noise-free and noisy curves virtually coalesce in this regime (especially for the scaled modulation). Next, we picked three particular values of parameter η and determined the SE values for these three specific cases. The results are given in Figs. 4(b), 5(b). Two sets of curves are given: the first one is model-based using (26) with the symbol-variances and pseudo-variances given by the theoretical analysis from the previous section. Superimposed on these are the data-based results that fit these pseudo-covariances directly from the data. As one can see, in both cases the data-based SE provides a tighter bound, which is not surprising given that our theory does not take into account the processing noise for which no theoretical model exists to date. For this reason we are not providing the noise-free estimates for the SE (as we did for the Q-factor), as the processing noise takes over at the higher values of the input power, and the performance there is starting to degrade noticeably.
Most importantly, our theory works very well in the ASE-dominated regime up to the optimal value: the relative SE difference between the theory and the data-driven estimates does not exceed 2%. The maximum found value of SE was 3.8 bits/s/Hz for the scaled transmission and 3.04 bits/s/Hz for the unscaled transmission. The former value is very close to the reported record theoretical SE per polarization from Ref. [16], In fact, our result is slightly higher as the maximal dual-polarization b-modulation SE was estimated as 7.2 bits/s/Hz in that reference. Note that here, we obtain our results under strict HD-FEC constraint and if these were relaxed, as in Ref. [16], then even higher values of SE (up to 4.5 bits/s/Hz observed in our simulations) can be achieved.

Conclusion
In this paper we have developed a rigorous theory for the ASE-induced spectral noise in the NFDM transmission systems using b-modulation. Based on perturbation theory for the path-averaged NLSE and far-distance asymptotes of the Jost functions derived from Riemann-Hilbert formalism, we obtained a remarkably simple input-output relation (25) for the effective b-modulation channel. This results predicts that the ASE part of the spectral distortion is squeezed as the input power grows, and the b-coefficient is pushed to the limit |b| = 1. The eventual ASE degradation observed at high power values is attributed solely to the processing noise stemming from the inadequacy of the path-averaged model, burst truncation, and insufficient sampling rate resulting in the (I)NFT numerical routines performance drop [24].
We have used this model to provide general analytical results for the symbol-wise covariance and pseudo-covariance addressing the arbitrary form of b-modulation coding: both the scaled case (where the constraint |b(ξ)|<1 is enforced by applying a squeezing transformation to the information sequence (10)) and the unscaled, that uses flat spectral carriers (we chose root raised cosines), within which the maximum value of the b-coefficient is easily controlled. For both schemes we have obtained an excellent agreement between the theory and the data-based results for the achievable information rate and spectral efficiency in the region where the spectrum is degraded by the ASE. The theory predicts the record spectral efficiency of 3.8 bits/s/Hz for the scaled b-modulation system with the BER remaining below 7% HD-FEC.
As the future avenue of research, we suggest exploring new nonlinear carrier shapes (rather than sincs and RRCs) that will can help reducing the SE penalty η while our theoretical result (25) can be used to estimate the AIR, i.e. capacity per symbol. One interesting candidate for such shapes is the recently suggested flat-top carriers [34]. Another promising way to increase SE can be the adaptation of faster than Nyquist signalling for the NFDM as recently reported in [58]. Yet another perspective direction can be combining the discrete NF modes (solutitons) with the b-modulation, which can result in the gain in the transmission rate [59] where, however, the correct analytically-tractable channel model is yet to be built using the results of general nonlinear perturbation theory [27].
Note added in proof. When this manuscript was under review, a paper of Civelli et al. [60] was published, where the authors analyzed modulation and detection strategies for the NFDM systems, including the b-modulation case with two scaling functions (linear and exponential) considered in our work. The authors of Ref. [60] also proposed a bevy of advanced de/modulation strategies accounting for the specific NFDM channel features, which can be employed to increase noise tolerance and boost the transmission quality, and we expect that by using such strategies we can improve the estimates obtained in the current paper even further.