Decision-Feedback Detection Strategy for Nonlinear Frequency-Division Multiplexing

By exploiting a causality property of the nonlinear Fourier transform, a novel decision-feedback detection strategy for nonlinear frequency-division multiplexing (NFDM) systems is introduced. The performance of the proposed strategy is investigated both by simulations and by theoretical bounds and approximations, showing that it achieves a considerable performance improvement compared to previously adopted techniques in terms of Q-factor. The obtained improvement demonstrates that, by tailoring the detection strategy to the peculiar properties of the nonlinear Fourier transform, it is possible to boost the performance of NFDM systems and overcome current limitations imposed by the use of more conventional detection techniques suitable for the linear regime.


Introduction
Nowadays, most of the global data traffic is carried by optical fibers. However, optical fiber Kerr nonlinearity severely limits the capacity of current optical communication systems, whose design is based on concepts developed for linear systems [1]. For this reason, recently, there has been growing interest in nonlinear Fourier transform (NFT)-based transmission schemes [1][2][3][4][5][6][7]. The NFT [5,8], a sort of nonlinear analogue of the standard Fourier transform (FT), defines a nonlinear spectrum that evolves trivially and linearly along the optical channel on its nonlinear frequency domain, thus turning nonlinearity into a mean for information transfer rather than an impairment. Nonlinear frequency-division multiplexing (NFDM) is a novel transmission paradigm that aims at mastering nonlinearity using the NFT to avoid any deterministic intra and interchannel interference by encoding information directly on the nonlinear spectrum [5,6]. However, research about NFDM schemes is still ongoing and it is not yet clear whether NFDM can outperform conventional systems [9,10]. We refer to [1] for a complete review about different NFT-based transmission schemes.
Nonlinear inverse synthesis (NIS) [2] is a popular NFDM scheme based on the NFT with vanishing boundary conditions. NIS maps the information on the continuous part of the nonlinear spectrum through a backward NFT (BNFT), and then recovers it with the reverse operation, the forward NFT (FNFT). An intrinsic limitation of this scheme with respect to conventional systems was highlighted in a previous work [9]. Indeed, the insertion of guard symbols between different bursts, as required by NIS, causes a reduction of the overall spectral efficiency that, differently from conventional systems, cannot be mitigated by increasing the burst length, as system performance decays for longer bursts. Moreover, in [9] it is conjectured that the NIS performance decay is due to the considered symbol-by-symbol detection strategy, based on the Euclidean distance, which is optimal only in the linear regime and does not account for the statistics of noise in the nonlinear spectrum.
In this work, we exploit a powerful property of the NFT to devise a novel detection strategy for NFDM systems. The proposed strategy employs decision feedback and the BNFT to avoid the detrimental effect that an increase of the burst length has on the noise statistics in the nonlinear frequency domain. After introducing and discussing the decision-feedback BNFT (DF-BNFT) strategy, system performance obtained through simulations are presented, and theoretical performance estimations are given.
The paper is organized as follows. Section 2 states and proves a causality property of the NFT that plays a key role in the derivation of the DF-BNFT detection strategy. Section 3 describes the system and derives two important corollaries of the causality property. Section 4 introduces and explains the DF-BNFT detection strategy. Section 5 derives an approximation, a lower bound, and an upper bound to the probability of error of DF-BNFT. Section 6 deals with DF-BNFT performance; also, it compares DF-BNFT with standard FNFT detection and conventional systems in terms of performance, considering different scenarios. Section 7 validates the approximation and bounds derived in Section 5 by comparison with simulation results. Finally, Section 8 concludes the paper.

NFT causality property
The BNFT-by which a time domain signal r(t) is recovered from its nonlinear spectrum ρ(λ)-can be computed via the Gelfand-Levitan-Marchenko equation (GLME) [8] Channel repeat for each symbol Enc.
where σ = 1 in the focusing regime and σ = −1 in the defocusing regime, * denotes the complex conjugate, and F(y) is an integral function of the nonlinear spectrum [8]. Specifically, if there is no discrete spectrum, The time domain signal is recovered from the GLME solution K(x, y) as r(t) = −2K(t, t). Proof. Since r(t) = −2K(t, t), one should obtain the solution K(t, t) of Eq. (1) for all t ≥ τ.

System description
The transmission scheme considered in this work is sketched in Fig. 1. As in the NIS scheme [2], the transmitter (TX) encodes a burst of N b symbols {x 1 , . . . , x N b } drawn from an M-ary quadrature amplitude modulation (QAM) alphabet {X 1 , . . . , X M } onto a QAM signal with pulse shape g(t) and symbol time T s . In this work, for reasons that will be clarified later, we restrict g(t) to have a finite duration T ≤ T s . The ordinary Fourier transform S( f ) of (3) is then mapped on the continuous part of the nonlinear spectrum ρ(λ) according to Furthermore, before computing the BNFT, deterministic propagation effects (dispersion and nonlinearity) are precompensated by multiplying the nonlinear spectrum by exp( j4λ 2 L), where L is the link length. Finally, the input optical signal is taken to be q(t) = q (−t), where q (t) is the BNFT of the precompensated nonlinear spectrum. There are two differences between the TX described here and the one in [2], which, however, do not change the overall NIS working principle. Firstly, propagation effects are removed at the TX (precompensation) rather than at the receiver (RX). While both solutions are feasible (and splitting the compensation between TX and RX might even be advantageous [9]), precompensation is considered here because the proposed DF-BNFT detection strategy relies on it. Indeed, Corollary 3, on which DF-BNFT is based (as explained in the following), applies only to optical signals unaffected by propagation effects. Precompensation ensures that the received signal meet this requirement. Secondly, in this work each burst in the optical signal is inverted in time just to help explaining the working principle, not because it is necessary. Finally, for the sake of simplicity, we avoid explicitly mentioning the normalization and denormalization procedures required to relate the normalized signals q(t) andr(t) (see Fig. 1) to the physical signals propagating in a real fiber link, therefore assuming that the channel is characterized by the normalized version of the nonlinear Schrödinger equation (NLSE) given in [5].
Let us denote by r(t) the optical signal obtained by propagating q(t) in a noiseless channel. In this case, the only channel effect is the multiplication of the nonlinear spectrum by exp(− j4λ 2 L), so that r (t) = r(−t) would be the BNFT of ρ(λ), the nonlinear spectrum before precompensation.

Corollary 2.
Considering a NIS modulation, the above optical signal r(t) for t ≤ τ depends only on the values of the QAM signal s(t) for t ≤ τ.
Proof. Taking into account (2), (3) and (4), we have that F(y) = −s(−y/2)/2 and thus, F(y) for y ≥ 2τ depends only on the values of s(t) for t ≤ −τ. Therefore, recalling that r (t) is the BNFT of ρ(λ), the NFT causality property (Proposition 1) implies that r (t) for t ≥ τ depends only on s(t) for t ≤ −τ. Changing the sign of τ, one obtains that r (t) for t ≥ −τ depends only on s(t) for t ≤ τ. Finally, the thesis follows considering that r(t) = r (−t).

Corollary 3.
Let t k = (k − 1/2)T s . If the pulse shape g(t) in (3) has finite duration T ≤ T s , for t ≤ t k the optical signal r(t) depends only on the symbols x 1 , .., x k , and not on the following ones x k+1 , .., x N b , as shown in Fig. 2. In mathematical formulas where G is a generic function.
Proof. With this hypothesis, s(t) depends only on the symbols x 1 , . . . , x k for t ≤ t k . Then, applying Corollary 2, the thesis follows.

Decision-feedback BNFT detection
In conventional NIS, the RX recovers a noisy version of the transmitted nonlinear spectrum ρ(λ) by computing the FNFT of the received optical signal, and then makes decisions based on standard matched filtering and symbol-by-symbol detection. The improved detection scheme proposed in this work originates from the idea that, since a detrimental signal-noise interaction takes place when computing the FNFT of the received noisy signal, decisions could be alternatively made by comparing the received signal with the BNFT of all possible transmitted (noiseless) waveforms, thus avoiding signal-noise interaction effects. Selecting the waveform (and the corresponding symbols) closest to the received optical signal would correspond to a maximum a posteriori probability (MAP) strategy, under the assumption that the accumulated optical noise can be GLME eq.
The NFT causality property for NIS with no ISI on s(t). A train of Gaussian pulses, modulated by 16QAM symbols, and almost ISI-free, is shown before (on the left) and after (on the right) the BNFT. The red signal is generated by 8 symbols, while for the blue one only the first 6 are taken into account. The two optical signals are superimposed for t ≤ t 6 , as for Eq. (5) (baudrate R s = 50 GBd, optical power P s = 7 dBm). modeled as additive white gaussian noise (AWGN) (more on this later). The drawback of a sequence-rather than symbol-by-symbol-detection strategy, is an exponential growth of the detector complexity with the burst length N b . In order to avoid this growth, the NFT causality property and a decision-feedback scheme are finally employed, obtaining the DF-BNFT detection scheme depicted in Fig. 1.
Having in mind that NIS performance decay is caused by the detection strategy itself, rather than signal-noise interaction during propagation-as also hinted by the fact that the performance obtained by simply adding AWGN after noiseless propagation is superimposed with that of the actual optical link [9]-the aim of this work is to devise a detection strategy at least optimal for the AWGN channel. Therefore, assuming the channel as AWGN, we can write the received noisy optical signal asr where r(t) is, as before, the optical signal obtained by propagating the input signal q(t) in a noiseless channel, and n(t) is circularly-symmetric complex white Gaussian noise with power spectral density N 0 . We would like to stress that (6) is only an ansatz and not an actual identity. An analog-to-digital converter (ADC) recovers the samples of the received noisy optical signal and collects them in the vectorr. The ADC is modeled as a rectangular filter with bandwidth ν/(2T s ) that acquires ν samples per symbol time. Assuming that the filter bandwidth is larger than the overall signal bandwidth, under the AWGN assumption (6),r is a sufficient statistic and we can writer = r + n, where r is a vector collecting the samples of r(t), and n is a vector of i.i.d. circularly-symmetric complex Gaussian r.v.s n k , with zero mean and variance σ 2 = E {|n k | 2 } = N 0 ν/T s . Therefore, conditional on r, the components ofr are independent. For the sake of simplicity, letr k (and r k ) be the vector of length ν representing the noisy optical signal (and, respectively, its noise-free equivalent) in the time window [t k−1 , t k ). Hence,r can also be written as a compound vectorr = (r 1 , . . . ,r N b ) containing the samples of the received signal in [−T s /2, (N b − 1/2)T s ), i.e., the time window of duration N b T s in which information is encoded. Indeed, with respect to the QAM signal s(t), the optical signal after the GLME broadens in time, developing a sort of right tail that extends outside the considered detection window, i.e., for t > (N b − 1/2)T s , as shown for instance in Fig. 2. Therefore, a longer vectorr should be considered to obtain a sufficient statistic. However, in the decision-feedback strategy derived in the following, this tail could be used only to detect the last symbol x N b of the sequence, with a negligible contribution to the overall performance. Hence, for the sake of simplicity, we simply discard it.
According to the MAP strategy, and assuming equally likely input symbols, optimal detection maximizes the probability density function (pdf) p(r |x) of the vectorr conditional upon the transmitted sequence x = (x 1 , .., x N b ). Thus, an optimum RX chooses the sequencex according tox Since, conditional upon x, the components ofr are independent, the pdf in Eq. (7) can be factorized asx where the second equality stems from the monotonic behavior of the logarithm. The NFT property (5) implies that the signal samples in the time window [t k−1 , t k ), i.e., those collected inr k , depend only on the symbols (x 1 , . . . , x k ). Therefore, an optimum RX performs decisions according tox Since the symbols x 1 , . . . , x k uniquely determine r k , under the AWGN assumption we have so that ln p r k (x 1 , . . . , x k ) = −ν ln πσ 2 − r k − r k 2 /σ 2 . This implies that, in order to determine the optimal sequencex, all possible M N b input sequences should be considered, which is not a viable solution. In order to avoid this exponential growth of complexity, we resort to a sub-optimal decision-feedback strategy-namely, DF-BNFT-by which symbols are decided iteratively for k = 1, . . . , N b aŝ bringing down to M × N b the number of sequences to be considered. Equation (11) can be evaluated comparing the (samples of the) received signal with (the samples of) M trial waveforms is obtained from the symbol sequencex 1 , ...,x k , X i by the same encoding technique used at the TX, except for precompensation. Let r (i) k denote the vector of length ν containing the samples of r (i) , referred to as detection window. In other words, r (i) k , i = 1, . . . , M, are all possible vectors r k given that the sequencex 1 , ...,x k , X i has been sent. The RX implements the DF-BNFT strategy through N b steps as follows.
For k = 1, . . . , N b : • Digitally obtain the vectors r (i) k , for each X i in the symbol constellation {X 1 , .., X M }.
The DF-BNFT strategy (11) avoids any ISI thanks to decision feedback, which accounts for the dependence ofr k on previous symbols, and to Corollary 3, which ensures thatr k does not depend on next symbols. However, it is still suboptimal compared to (9) for two reasons. The first reason is that it does not exploit the information about x k that is contained in the received signal after t k . In fact, as shown in Fig. 2, while in the original QAM signal s(t) (on the left) the information about x k is fully contained in the time interval [t k−1 , t k ), in the corresponding optical signal r(t) (on the right) part of this information goes to times t > t k , the effect becoming more relevant with power and as k increases. An apparent consequence of this effect is that the optical signal r(t) has an average amplitude that decreases with time and a sort of "tail" that extends beyond the duration of the original QAM signal. The second reason is that it is affected by error propagation, as previous decisionsx 1 , ...,x k−1 might be incorrect, this effect becoming more relevant as k increases, too. Finally, even the strategy (9), that is optimal for the AWGN channel (6), might be suboptimal on a real fiber link. The effects of error propagation, information loss, and non-AWGN channel statistics are discussed in more detail in Section 6. While the DF-BNFT strategy is conceived to improve system performance (as it will be verified in Section 6), it also has some drawbacks compared to the FNFT strategy. Firstly, to fulfill the hypothesis of Corollary 3 and avoid ISI, the pulse shape must be fully confined within a symbol time-a more stringent requirement compared to the conventional Nyquist criterion. This imposes the use of pulses with a wider bandwidth, which has a negative impact on the achievable spectral efficiency. Secondly, full channel precompensation is required to use Corollary 3 at the RX, and, therefore, channel compensation cannot be split between RX and TX to reduce guard intervals [4,9]. Thirdly, as far as it concerns computational complexity, the DF-BNFT scheme requires the evaluation of M BNFTs over νN b points. On the other hand, standard NIS detection computes only one FNFT over a comparable number of points. While an exact comparison between the two strategies depends on the relative complexity and accuracy of the considered BNFT and FNFT algorithms, it is reasonable to assume that the complexity of the DF-BNFT detector is considerably higher than that of standard FNFT detection. However, the aim of this work is to show that standard FNFT detection is far from being optimal, and that a significant increase of performance can be achieved by an improved detection strategy.

Error probability estimation and bounds
For a given sequence, the probability of error P e of the DF-BNFT strategy can be evaluated by averaging (over all constellation symbols and symbols within a burst) the probability P (m) k that an error occurs when the symbol X m is sent at position k, provided that the symbolsx 1 , ..x k−1 are correctly detected. Specifically, As computing P (m) k may be difficult, we will bound it through standard techniques [11]. By defining the events E m,i = {X i is preferred to X m when deciding on x k } and we can upper bound P (m) k by the union bound Due to our AWGN assumption, the pairwise error probabilities in (15) are given by where is the Euclidean distance between r (m) k and r (i) k , and Q(x) is the Q-function. We can also obtain a useful approximation on P (m) k as follows. Denoting by C m,i the event complementary to E m,i , we have and, taking into account that P C m,i (x 1 , ..., where the approximation is due to the fact that, in general, the events C m,i are not mutually independent. Let us now derive a lower bound. Recalling that the probability of a union of events is lower bounded by each one of the probabilities of the single events, we have In conclusion, from (15), (19), (20), and taking into account (16), we have the following upper bound, approximation and lower bound, respectively, on P (m) and d (m,i) k is as in (17). Replacing (21)-(23) into (13) gives the corresponding bounds and approximation on P e . We remark that both the bounds and the approximation were derived by assuming that the channel is AWGN and that previous decisions are correct. Deviations from this ideal situation might invalidate the bounds and slightly reduce the accuracy of the approximation, as shown in Section 7 by numerical simulations.
The above estimates of the probability of error for a given sequence should be averaged over all possible M N b sequences to obtain the average error probability. However, to speed up computation, one can think of averaging over randomly generated sequences until the result stabilizes. As we will show in Section 7, this practical approach still provides a reasonable accuracy and a significant computational saving compared to direct error counting.

System performance
We simulated the system described in the previous sections and sketched in Fig. 1 by using a 16QAM signal s(t) with symbol rate R s = 1/T s = 50 GBd. To fulfill the no-ISI requirement on s(t), the supporting pulse was chosen to be g(t) = exp −12.5(t/T s ) 2 , i.e., a Gaussian pulse with a full width at half maximum (FWHM) of (2/5) √ 2 ln 2 T s T s /2, so that about 99.9% of the energy of a pulse is contained in a symbol time. This pulse shape is chosen to fulfill the requirement of Corollary 3 on the duration of g(t) with only a moderate bandwidth increase compared to the root-raised-cosine pulse employed in [9]. Note that, by relaxing the energy constraint, the bandwidth could be further reduced to approach the one in [9]. This, however, would also reduce the performance due to the ISI generated by the pulse tail, with an overall effect on the achievable spectral efficiency that should be carefully considered. The optimization of the pulse shape for the maximization of the spectral efficiency is outside the scope of this work and will be addressed in a future publication. A typical example of generated signal is shown in Fig. 2 on the left.
The channel is a standard single-mode fiber of length L = 2000 km with group velocity dispersion (GVD) parameter β 2 = −20.39 ps 2 /km, nonlinear coefficient γ = 1.22 W −1 km −1 , and attenuation α = 0.2 dB/km. Ideal distributed amplification (spontaneous emission factor η sp = 4) is considered along the channel. The bandwidth of both the digital-to-analog converter (DAC) and the ADC is 100 GHz. In order to account for the NFT boundary conditions and for temporal broadening due to dispersion, a total of N z = 2000 guard symbols is considered in our simulations. The FNFT and BNFT operations are performed considering an oversampling factor of 8 samples per symbols, respectively using the Layer-Peeling (LP) method [1] and an enhanced version of the Nystrom method [12]. The simulation results are deemed free from numerical inaccuracies, since it was verified that the noise-free performance was sufficiently higher than the noisy one (when applicable), and that higher accuracy did not change the results, similarly to what done in [13].
As often customary in optical communications, performance is measured in terms of Q-factor, defined as Q 2 dB = 20 log 10 [ where the probability of bit error P b is given by direct error counting [14]. The rate efficiency η = N b /(N b + N z ) is considered to account for the spectral efficiency loss due to the insertion of guard symbols [9]. The performance is reported as a function of the mean power per symbol, defined as P s = E s /T s , where E s = E tot /N b is the mean energy per information symbol and E tot the total energy of the optical signal (so that the actual average optical power is ηP s ). Figure 3 shows the NFDM performance obtained with DF-BNFT detection (solid lines), and with conventional FNFT (dashed lines), for different burst lengths (same color for same length). As can be seen, the performance obtained with DF-BNFT is significantly better than that obtained with FNFT detection, with an improvement of 4.4 dB for N b = 256 and 6.2 dB for N b = 2048. However, performance still decays when increasing N b . This behavior may be due either to the suboptimality of the DF-BNFT detection, or to an intrinsic limitation of the

NIS modulation format.
For what concerns suboptimality, there are three possible causes of performance degradation, already discussed in Section 4: the non-AWGN statistics of the fiber channel, which is affected by signal-noise interaction; the error propagation in the decisionfeedback mechanism of (11); and the information loss entailed by (12), which neglects the information about x k that is contained in the received signal for t > t k . All these effects become more relevant as the burst length increases. Also signal-noise interaction increases with the burst length, as the optical noise interacts with a longer portion of non-zero signal. This effect, however, saturates when the burst length becomes longer than the channel memory. One may wonder whether the NIS performance shown here is in accordance with the theoretical estimation of the signal to noise ratio (SNR) given in [15]. Such a comparison was performed in [9] for almost the same system configuration considered here. The only differences are: (1) the modulation format, which however does not significantly change the results, and (2) the chosen pulse shape. In this work, a pulse with a shorter time duration and, hence, a wider bandwidth is considered. This difference is responsible for a slight performance improvement.
The impact of signal-noise interaction during propagation can be estimated by comparing Error-prop. free On the other hand, for shorter bursts, i.e., N b = 256, 512, the slight difference between the dotted and solid lines denotes a small impact of signal-noise interaction on system performance and a slight deviation of channel statistics from the AWGN assumption (6). One of the effects of signal-noise interaction during propagation is a constant phase rotation of the optical signal. This deviation can be estimated and removed from the optical signal considering for detectionr(t)e −jα , α being the phase shift, rather thanr(t) itself. Fig. 4 shows that a small performance gain can be obtained with this technique (performance shown with dashed lines) and that the performance approaches those of the equivalent AWGN channel. Obviously, when performance is superimposed to that of the AWGN channel, this technique does not affect the detection strategy. As regards error propagation, its impact can be estimated from Fig. 5, in which the actual DF-BNFT performance is compared to that of an ideal detector that makes decisions according to the same strategy (11), but using the correct symbols x 1 , . . . , x k−1 rather than the detected onesx 1 , . . . ,x k−1 . Contrarily to what observed for signal-noise interaction, the impact of error propagation is more relevant for longer bursts, while it tends to be negligible for shorter ones. Indeed, for longer bursts, farther symbols (i) affect more significantly detection, and (ii) are more likely to be wrong, since the probability of error is higher.
As regards the third possible cause of performance degradation, further investigations are required to estimate the impact of information loss due to the suboptimality of (11) and to devise a better strategy to avoid it. Eventually, the implementation of an optimal strategy based on (9), but with a feasible complexity, would allow to estimate the ultimate performance of NIS modulation and to understand if the observed performance decay is due to suboptimal detection or to an intrinsic limitation of this modulation scheme. This is currently under investigation.
The maximum performance achieved by FNFT and DF-BNFT detection (at their respective optimum power) in Fig. 3 are reported in Fig. 6 as a function of the rate efficiency and compared with the maximum performance achieved by conventional systems (also operating in burst mode, for a fair comparison) employing ideal electronic dispersion compensation (EDC) and digital backpropagation (DBP) (practically implemented by the split-step Fourier method with 100 steps per span of fiber, enough to practically achieve a perfect compensation of deterministic nonlinearity). DBP performance is estimated from the error vector magnitude [16], rather than calculated by direct error counting, being the corresponding error probability too low to be measured. The improvement of DF-BNFT with respect to FNFT is quite relevant and slightly increases with the rate efficiency η. However, the DF-BNFT performance is still not on par with that of conventional systems and keeps decreasing at higher rates, when the performance of conventional systems saturate to the one achieved with continuous (non burst mode) transmission. Finally, the performance achievable by DF-BNFT detection was investigated also in different scenarios. Fig. 7a refers to the same system setup used in Figs. 3-6 but with a lower dispersion parameter β 2 = −1.27 ps 2 /km and, therefore, a lower number of guard symbols N z = 125. The overall results do not change significantly, as already observed in [9], and DF-BNFT achieves a performance improvement of almost 6 dB with respect to FNFT detection. The behavior also does not change when using quadrature phase-shift keying (QPSK) symbols in the otherwise same system of Fig. 3 but with lower symbol rate R s = 10 GBd, longer link length L = 4000 km, and N z = 160 guard symbols, as shown in Fig. 7b.

Validation of the approximation and bounds
The bounds and approximation obtained by replacing (21)-(23) into (13) are reported in Fig. 8 for N b = 256 and N b = 1024, after conversion to Q-factor. As the Q-factor is directly related to the bit-error probability P b , the approximation P b P e /M is used, assuming that a symbol error always corresponds to a single bit error. Moreover, if P e increases, the Q factor decreases, and the other way around. Therefore a lower (or upper) bound for P e becomes an upper (or lower) bound for Q 2 dB . In order to check their accuracy, they are compared with the performance obtained by numerical simulations for the actual fiber channel. In both cases, the approximation lies between the bounds, asymptotically approaching the lower bound when power increases. At low powers, the approximation is in very good agreement with numerical simulations. On the other hand, near the optimum power, the approximation overestimates the actual performance, which falls slightly below the lower bound. This is due to signal-noise interaction during propagation (for N b = 256) and to error propagation in the decision-feedback strategy (for N b = 1024), both neglected in the derivation of the bounds and approximation (21)-(23). In fact, when considering the numerical simulations for the AWGN channel in Fig. 8a, and the error-propagation-free simulations in Fig. 8b, they correctly fall between the bounds and are in excellent agreement with the approximation.
As already explained, the probability of error for a given sequence P e in (13) should be averaged over all possible sequences. However, the number of possible sequences M N b is practically unmanageable, making it impossible performing an exact average. Anyway, most sequences contribute in the same way to the average, so that we don't need to explore all of them, but only account for the most significant ones. This can be done by performing a Monte Carlo average, consisting in randomly generating sequences until the corresponding average performance stabilizes. This is in contrast with the full numerical estimation used in Section 6, in which also the effect of noise is numerically estimated by averaging over many random realizations. To illustrate the difference between the two approaches and show the speed of convergence of the various estimates, Fig. 9 reports the same bounds and estimates shown in Fig. 8a as a function of the number of iterations (corresponding to the number of sequences of length N b over which the performance is averaged), considering two different values of the mean power. As can be seen, the computation of the semianalytical bounds and approximation requires only a few iterations in all cases, while computing the performance by full numerical simulations requires a number of iterations that depends on the actual error probability and, hence, on the input power, such that a sufficient number of error events are observed. As an example, 30 ÷ 40 iterations suffice for an input power of −9 dBm, while more than 200 iterations are necessary at the optimum input power of −4 dBm.

Conclusions
Differently than in conventional systems, there is still no theory for optimum detection when using the NFT as a mean to transmit information. As a result, the existing NFT-based transmission schemes mostly rely on reasonable assumptions and are guided by concepts and principles borrowed from the more familiar detection theory for linear systems. This means that big improvements are to be expected as soon as the principles of nonlinear detection theory are better understood. There is a lot still to be discovered, but some well known general principles can be applied independently of the nature of the systems at hand, be it linear or not. For example, this is the case for the MAP detection strategy to be applied in a given communication system, once a statistical knowledge of the channel is available or can be reasonably approximated. Following this line, by assuming a simple channel model, a novel (suboptimal) detection strategy for NFDM systems, referred to as DF-BNFT, has been introduced in this work. In a nutshell, taking advantage of a peculiar property of the NFT, rather than using a strategy based on the FNFT and the minimization of the Euclidean distance in the nonlinear frequency domain, DF-BNFT takes decisions by minimizing the Euclidean distance in the time domain through BNFT and decision feedback. Moreover, a semianalytical approximation, upper and lower bounds to system performance have been derived, providing an effective tool to estimate system performance without resorting to time-consuming numerical simulations.
As demonstrated by numerical results, the proposed detection strategy allows for a performance improvement of up to 6.2 dB with respect to standard FNFT detection. Despite such a big improvement, also the performance of the proposed DF-BNFT strategy decays when the burst length increases, similarly to what observed for FNFT-based detection [9]. This peculiar behavior forces the use of short bursts, separated by long guard times, severely limiting the overall spectral efficiency achievable by the system. However, while the performance decays in FNFT and DF-BNFT detection appear to be similar, their causes are different: the perturbation of the nonlinear spectrum caused by optical noise (and non-optimally accounted for by the detection metrics) in the former case; the information loss and error propagation caused by the use of the suboptimum strategy (11) in the latter.
Though the persistence of this undesirable behavior might be daunting, in fact making NFDM not yet competitive with conventional systems, the relevant performance improvement obtained by a still sub-optimum strategy is rather encouraging and paves the way for further progresses. Possible future developments include: the implementation of the optimum detection strategy (9) to overcome the limitations induced by (11); the reduction of computational complexity; the optimization of the pulse shape to maximize the achievable spectral efficiency; the investigation of the impact of more practical amplification schemes, such as lumped amplification, in which the lossless path-average model needs to be employed [17,18]; and the extension of the detection strategy to the dual-polarization case. Such developments are required to fully exploit the potentials of the NFDM technique and to understand if it can provide any practical advantages compared to conventional transmission techniques.

Funding
Ente Cassa di Risparmio di Firenze.