Sparse Identification for Nonlinear Optical Communication Systems: SINO Method

We introduce low complexity machine learning based approach for mitigating nonlinear impairments in optical fiber communications systems. The immense intricacy of the problem calls for the development of"smart"methodology, simplifying the analysis without losing the key features that are important for recovery of transmitted data. The proposed sparse identification method for optical systems (SINO) allows to determine the minimal (optimal) number of degrees of freedom required for adaptive mitigation of detrimental nonlinear effects. We demonstrate successful application of the SINO method both for standard fiber communication links and for few-mode spatial-division-multiplexing systems.


I. INTRODUCTION
There is an enormous pressure on the fiber-optic communication industry to deal with the exponentially increasing capacity demands from data traffic [1,2] driven by the existing and constantly emerging internet and broadband services as well as the fast growing machine-to-machine traffic.Current technological solutions suggest a combination of advanced modulation formats and space division multiplexing to achieve substantial enhancement of the spectral efficiency [3][4][5][6], while fiber remains a nonlinear medium leading to signal being strongly degraded by nonlinear impairments.In future systems, potentially using different types of fibers, nonlinear effects still will be one of the fundamental limiting factors.Therefore, compensation of the nonlinear signal distortions is a critical challenge for development of the next generation of communication systems operating at higher transmission rates.
Up to date, non-linear impairment mitigation has been considered mostly for single mode fiber links with a number of techniques being proposed both in the optical and electronic domain [7][8][9][10][11].Most of the research in the field of electronic compensation has been focused on back-propagation algorithms that emulate optical transmission in the digital domain through an inverse fiber link (with reversed order segments and opposite sign parameters), realized either by means of a Split-Step Fourier method [10,11] or Volterra Series Transfer Functions [12][13][14].Despite various simplifications that have been proposed, both approaches are considered to be highly complex because they require multiple computational steps along the link and at least two samples per symbol.This has heavily discouraged any effort for future commercial deployment even for legacy SMF fiber systems.
On the other hand, perturbation analysis of the Manakov equations has led to the development of efficient equalization methods that can mitigate accumulated intra-channel impairments in a single computational step and one sample-per-symbol [15,16].Central to this approach has been the identification of the perturbation coefficients that describe the interaction of each symbol with its preceding and succeeding symbols in the transmission channel [17][18][19][20].For static connections and specific pulse shapes, such as Gaussian or sinc, the perturbation coefficients can be derived analytically and stored in a look-up table [15,21].Since the total number of terms can be excessively high for dispersion un-managed links, where the channel memory is long, exploiting common symmetries and quantizing the coefficients are two of the techniques that have been recently proposed for complexity reduction.Furthermore, to achieve operation in reconfigurable network environments, an adaptive version of the method that uses training sequences and decision-directed least-mean squares algorithms has been introduced in [22].This enables to identify the perturbation coefficients before establishing any new connection and without prior knowledge of the corresponding transmission link parameters.The latter progress places the perturbation method in a broader context and signifies its practical importance in future optical networks.
With this paper, we expand the application of perturbation-based nonlinear compensation in few mode fiber transmission systems by introducing a novel channel model, which captures nonlinear interplay between different modes in weak and strong coupling regime: Sparse Identification for Nonlinear Optical communication systems: SINO method.Contrary to the aforementioned approaches that deal only with the intra-channel nonlinearities, here, the proposed SINO method takes into account also the nonlinear interaction between the co-propagating modes by introducing additional inter-channel perturbation terms.The associated complexity scaling was addressed by adapting sparse identification method [23], which makes use of the Lasso algorithm [24,25], thus, enabling computation of perturbation coefficients with inherent principal component analysis.The latter minimizes a mean squared error (MSE) estimation based on the training sequence and removing redundant predictors to improve model accuracy.The method does not require knowledge of the transmission line and is applicable for multi-dimensional multichannel systems.To demonstrate the application of the SINO method we consider the transmission of a D spatial modes and two polarizations along a few mode fiber link of N s amplified spans, see Fig. 1.At the receiver, a demultiplexer performs an initial separation of the spatial super-channel by projecting its tributaries onto a fixed mode basis.After coherent detection, matched filtering and down-sampling, the signals are fed into a linear MIMO equalizer, which compensates dispersion and remaining mode mixing effects.The transmission nonlinearities are treated separately by a non-linear MIMO processor (see Fig. 2).Unlike the linear equalizer, the processor creates a nonlinear matrix Θ(X) (Fig. 3), which we call library, by forming mixing products of the input symbols.Depending on the channel memory the library will encompass different combinations -nonlinear mixing -of input symbols, thus, the library will have more elements that the original modulation format.See how library expands by different combinations of 16-QAM symbols in Fig. 3: without memory in linear scenario the library has 16 symbols, taking into account nonlinearity (self-phase modulation) results in additional 16 symbols (formed from the original 16-QAM by a simple nonlinear transformation -x k |x k | 2 , while adding memory expands the library further -for M = 1 there are 48 new symbols formed as x k |x k+1 | 2 , further increase of memory expands the library even more.Subsequently, it combines together the resulting terms through a coupling matrix Ξ, which contains information about the transmission line parameters and the pulse-shape of the signal.As symbols interact with different weights, the impact of some of these terms can be negligible, hence the matrix Ξ is sparse and we can employ machine learning tools in identifying it.We may write: where Y is a vector of length m representing the output signal.The nonlinear system of Eq. 1 can be viewed as a representation of the output Y via a set of coordinates Ξ on a complex functional space Θ(X).The sparsity of Ξ allows us to simplify significantly the computational complexity of the nonlinear filtering process by removing redundant coefficients.As we will see below, this is an inherent characteristic of the lasso method, which has been adopted here for identifying the optimum Ξ.
On the other hand, the exact form of the library matrix Θ(X) depends on the type of non-linearity the equalizer has to address.Given that in our case the fiber channel has memory we should expect each row of the matrix to involve not only elements, such as, x(t 1 )...x 2 (t 1 )..., but also the non-instantaneous interactions between the symbols in different time slots, such us |x(t 1 )| 2 x(t 2 ), therefore Θ(X) will be a dynamic matrix.Being able to identify its exact form is expected to improve significantly the accuracy and the convergence of the equalization process.This was achieved through a perturbation analysis on the Manakov equations that govern signal propagation along the FMF link and the derivation of a novel discrete-time multivariate channel model, which is described in detail below.An important FIG.3: The library is constructed from the received symbols according to the channel model Eq. 5, i.e.Θ(X) = [...x k x k |x k | 2 ...x k+m x k+n x * k+m+n ...] (here 16 QAM is plotted for illustration).
result of this derivation was the fact that we could analytically define the memory effects characterizing nonlinear interference within each channel and among the co-propagating modes and incorporate this dynamic behavior into Θ(X).

B. SDM Channel Model
For deriving the discrete-time channel model, and consequently the library matrix Θ(X), we assume that the propagation of an optical signal via an few mode optical fiber is governed by the Manakov equation for the D-modes [26][27][28]: here deterministic distortions are described by fiber losses α, second-order dispersion β 2 , and nonlinearity coefficient γ, whereas noise is zero-mean AWGN with variance η(z, t), η * (z where N D and L are notations for noise spectral density and transmission length correspondingly.The formula captures both a) weak and b) strong coupling regimes, when a) κ pp = 8/9, κ pq = 4/3 and b) κ pq = κ pq = 8D/3/(2D + 1).
Given the expansion of the signal over pulses (i.e. the field component of polarization state with nonlinear potential and where Ψ s (z) = e −αmod(z,Ls) is the signal power profile and L s is the span length.The coupling coefficients define the memory effects the channel, it depends on pulseshape and system parameters: Thus, we employ multiple scale analysis with small parameters: Then in the first order over nonlinearity we have the nonlinear distortion: Equation 5 reveals the memory property of the channel, which is that the interference between symbols decays exponentially with the symbol distance.Having exact knowledge of this property enables the development of a more efficient equalization process.
The impact of simplifications on coupling matrix was studied in [19], where C mn was approximated by a finite number of quantized levels and the degradation was estimated in terms of Q 2 -factor.To calculate the latter one needs to calculate numerically a (2M + 1) 2 -number of 4-dimensional integrals (for a single channel case or D(2M + 1) 2 for D-mode case).For a specific case of sinc-pulses single-channel WDM the problem has been reduced to 1-dimensional integrals [21] and has experimentally demonstrated the benefits of the model application for nonlinearity mitigation of intra-channel interference in multichannel-WDM.
In the next section we show how to obtain the matrix numerically using sparse identification and generalization of the model for higher nonlinearity.

C. Stage1: Building library
As we know the type of nonlinearity that takes place we can construct the library (see Fig. 4a Also, as all vectors are complex we will separate real and imaginary parts (see Fig. 4b): Finally, we can apply this method directly to output signal A(t), while it is very important to receive discretetime representation by using appropriate filter, e.g.matched filter.The latter is particularly fit for identification of discrete-time model within information-theoretic treatment.

D. Stage2: LASSO application
As the problem is formulated, next we can apply any of the machine learning algorithms for a sparse solution of an overdetermined system.So that LASSO inherently performs principal component analysis, determining the minimum number of non-zero elements in Ξ, consequently, the minimum number of required operations.Once we have the function basis we can use various methods to calculate the sparse matrix Ξ.In particular, here we applied least absolute shrinkage and selection operator (LASSO) [24,25] using mean-squared error as a cost function.
As data and library are prepared we are looking to solve the problem to find a minimum: where the penalty term: In our simulations we applied the lasso fit with ten-fold cross validation (see Fig. 4c).The resulted matrix Ξ comprises coefficients of the C mn matrix in Eq. 4 with simultaneous principle component analysis, which determines the minimum complexity operation.Once, the system is fully identified -the matrix Ξ is determined, one can use it to compensate nonlinear impairments (see Fig. 4d).

III. RESULTS AND DISCUSSION
The simulations parameters are summarized in Table 1.We investigated the transmission of a single wavelength spatial super-channel, along a 10-span FMF link of total length 10x100-km=1000-km.We considered D spatial modes, each one of them including two polarization states, with D taking values of 1, 3, and 6.On each of the 2D subchannels we launched a 4096-symbol, 16-QAM modulated stream of root-raised cosine pules (0.01 roll-off) at a symbol rate of 32GBaud and sampling rate of 16 samples-per-symbol (SpS).A FM-EDFA of 4.5 dB noise figure was considered after each span for compensating the propagation losses.Our focus was on investigating the equalization performance on the Kerr nonlinear effects only.Therefore, for the signal transmission we considered for every mode a Manakovtype of the propagation equation, see 3, where the nonlinear effects are already averaged over all polarization states due to the randomly changing birefringence.Also, for simplicity and without loss of generality, the modes belonged in the same group and they were strongly coupled, which implies a single nonlinear coefficient both for intra-and inter-modal effects (i.e.1.4 1/W/km), and common group velocity and chromatic dispersion parameters.Finally, the performance was compared in terms of Q 2 -factor which as calculated from the error vector magnitude (EVM), according to In Fig. 5, we compare the equalization performance of our proposed technique (dotted lines) with the case of equalizing linear impairments only (dashed lines), as well as, with the case of having ideal digital back propagation FIG.5: Q 2 -factor vs. Launch power per spatial mode (i.e.average power of two pol.multiplexed sub-channels) using the proposed method (dotted) compared to ideal digital back propagation (solid) and linear compensation (dashed), for 1, 3 and 6 spatial modes (green, blue, red).Analytic expressions Eqs.10-13 are plotted by filled symbols.
(DBP) (solid lines).Ideal DBP endures the full compensation of deterministic non-linearity and it is limited by nonlinear signal-noise interactions (here we ignored modal dispersion effects).Thus, the numerical estimations can be captured in a straightforward way by a simple analytic expression: the coefficient governing signal-signal interactions for ideal DBP scenario a DBP = 0, while for phase shift equalization while for the considered case β p 2 = β q 2 , β p 1 = 0, γ pq = γ qq the above expression can be easily simplified as (where L ef f is an effective length).Since numerical estimations for the coupling matrix C pq m,n might deviate from the precise estimation, the coefficient transforms to here δ pq is the Kronecker symbol.While the coefficient governing signal-noise interactions is given as The aforementioned Eqs.10-13 give a good estimate of the non-linearity impact for the different mode propagation scenarios (filled symbols in Fig. 5 by green, blue and red for 1, 3, 6-spatial modes -each including two polarizations states).For single mode propagation and at high launched powers, a deviation occurs suggesting the need of considering higher order nonlinear terms.
The proposed nonlinear equalization scheme, results in an above 3 dB improvement for single mode transmission, which decreases to 2 and 1 dB for the 3 and 6 mode cases, respectively.The decrease in performance is attributed to the fact that we had considered only the 1st order terms in the nonlinear expansion of the perturbation model.Our proposed method can be straightforwardly applied to higher orders by augmenting the library matrix Θ with higher order term of X, however, this is an issue to be addressed in a future work.
The performance of the proposed algorithm depends strongly on the calculation accuracy of the coupling matrix C through the evaluation of the sparse matrix Ξ.The calculations can be further improved by using more advanced methods for sparse matrix calculation than LASSO.Yet, in this simple configuration, which operated with a single sample per symbol and required just a single matrix multiplication, we were able to outperform conventional compensation techniques such as DBP.In Fig. 6 we have compared, in terms of received signal quality, our approach with the symmetric DBP algorithm which was solved for different number of steps per span and 2 samples per symbol.For single mode systems our method slightly outperforms the DBP of 2 steps per span.As the number of modes increases the performance difference is reversed in favor of DBP, so that for 6 modes our method compares to a DBP of single step per span.Even in that case, the complexity of the achieved equalization is significantly lower.
Indeed, an important advantage of the proposed algorithm is that it ensures lowest complexity by evaluating the coupling matrix elements while removing the redundant terms.For example, in the simulated case the number of unique non-zero elements was 48 compared to the estimation of lowest complexity previous algorithm with adaptive filtering [22] 96 components (M −1) log(M −1)+3M −1 and non-optimized 324 (2M +1) 2 [15], where M = ⌊B 2 β 2 L/2⌋ -estimation on channel memory (⌊x⌋ denotes a floor function of a variable x).Furthermore, the previous algorithms were developed only for compensation of inter-channel nonlinearities, while in multichannel operation the complexity will increase proportionally to the number of channels.And the proposed algorithm with the inherent principal component analysis is crucial.
Finally, the SINO method is scalable to different modulation formats and signal powers as the matrix Ξ remains unchanged, while power and modulation format influence are naturally incorporated in Θ.Thus, once the matrix Ξ has been identified during the establishment of a connection, and as long as the memory properties of the channel do not change, there is no need for retraining the algorithm and the same Ξ can be used for any other modulation format (see Fig. 7).This property makes the method extremely useful for flexible smart-grid network applications.

IV. CONCLUSIONS
We have developed a low complexity machine learning based nonlinear impairment equalization scheme and demonstrated its successful performance in SDM transmission links achieving compensation of both inter-and intra-channel Kerr-based nonlinear effects.The method operates in one sample per symbol and in one computational step.It is adaptive, i.e. it does not require a knowledge of system parameters, and it is scalable to different power levels and modulation formats.Finally, although it has been developed for single wavelength spatial super-channels it can be straightforwardly expanded to multi-channel systems and to any other type of nonlinear impairment.This work has been supported by the EPSRC project UNLOC EP/J017582/1.S. Sygletos acknowledges the support from the EU-FP7 INSPACE project under grant agreement N.619732.F. M. Ferreira is acknowledged for useful discussions

SINOFIG. 1 :
FIG.1: Scheme: SINO performs nonlinear MIMO equalization where the library is formed by nonlinear time-delayed function of the sampled symbols Θ(X) and the sparsity matrix Ξ, which contains information about nonlinear properties of the channel γ pp P z d , ε pq = κ pq γ pq P z d , ρ = N D B/P with B denoting signal bandwidth.Assuming solution in the form: Y = ∞ n=0 ε n Y n , in the main order we have linear channel with AWGN noise ζ k having statistics ζ k , ζ * m = δ km : x k x k |x k | 2 x k |x k+1 | 2 ... x k+m x k+n x * k+m+n ... ... ... ......

FIG. 4 :
FIG.4:The figure shows the operating principle of our scheme.a) First we form the library matrix Θ(X).Then we apply sparse regression to identify the strength of interference between the components of Ξ.Here, the LASSO algorithm is applied which works with real elements, therefore real and imaginary parts needs to be separated; b) the real part of the received symbol is plotted by red circles; c) LASSO identifies the matrix Ξ by minimizing mean squared error as a function of optimization parameter λ; the resulted matrix Ξ enables to establish a relation between output (red circles) and nonlinear combination of input symbols Θ. Compare simulations (blue crosses) with the symbols obtained from the model (red circles), while difference between them is plotted in black squares; d) once the channel is fully identified it can be used for compensation of the non-linearity, compare distorted symbols in red and recovered in blue.

FIG. 6 :
FIG.6: Q 2 -factor vs. Launch power per spatial mode using the proposed method (dotted) compared to a symmetric digital back propagation algorithms with a different number of steps per span (1...10) and to linear compensation only (dashed) for 1, 3 and 6 spatial modes.

Table 1 :
System parameters