Fiber echo state network analogue for high-bandwidth dual-quadrature signal processing

: All-optical platforms for recurrent neural networks can oﬀer higher computational speed and energy eﬃciency. To produce a major advance in comparison with currently available digital signal processing methods, the new system would need to have high bandwidth and operate both signal quadratures (power and phase). Here we propose a ﬁber echo state network analogue (FESNA) — the ﬁrst optical technology that provides both high (beyond previous limits) bandwidth and dual-quadrature signal processing. We demonstrate applicability of the designed system for prediction tasks and for the mitigation of distortions in optical communication systems with multilevel dual-quadrature encoded signals


Introduction
Machine learning, in particular neural networks (NNs), have recently attracted renewed and fastgrowing interest due to novel designs, advanced hardware implementations, and new applications.The surge of interest in recurrent neural networks (RNNs) is due to the broad spectrum of successful applications, which ranges from image analysis and classification, speech recognition, and language translation to more specific tasks like solving inverse imaging problems or signal processing in wireless and optical communications.In particular, in optical communications, RNNs have been used to compensate for nonlinear signal distortions [1][2][3] and nonlinearity mitigation for QPSK and 16-QAM modulation formats [4][5][6].However, many applications are limited because the typical scale of RNNs requires a significant amount of time for training and processing, which represents a challenging computational burden.Various electronic and optical hardware designs have been proposed and developed recently to enable faster implementation of RNNs.
The first realistic silicon neuron was built in the 1990s [7] and there is now a silicon version of the eye's retina [8] and the Neurogrid network that simulates one million neurons [9].Similarly, developed within the EC Human Brain Project (HBP) advanced silica-wafer-based SpiNNaker and BrainScaleS prototypes, also used external memory array.Also, in [10] silicon photonic weight banks were used to solve a differential system predicting 294-fold acceleration against a conventional computer.Only in semiconductor NN, the memory is achieved by optical delayed feedback [11].Fiber-based NN was demonstrated recently with gigabyte per second speed [12], where memory was introduced by external electrical memory array -this imposed a limit on the design, as the learning could not be done in real time.
One of the promising types of RNN is Echo State Network (ESN) [13], a specific type of reservoir computing (RC), which relaxes training complexity.RC requires only output weights to be trained and can be realized by randomly connected nonlinear nodes, "neurons," while the training is achieved by collecting the signals at each state to compute optimum output weights, moreover, ESN does not have limitations on the pulse shape.Non-linear equalization based on RC has been proposed [14].The "reservoir"-based approach makes it possible to demonstrate network comprising only one nonlinear node and delay line.All-optical implementation of RC enables high-speed signal processing and can set the framework for a new generation of hardware [11,15] for computing and future optical networks.One of the first demonstrations of an all-optical RC utilizing GHz bandwidth was based on a semiconductor laser [11], which was also used recently for amplitude modulated-signal processing [16].Yet, even the most advanced realizations are fundamentally limited by the operation bandwidth of the order of a few GHz [11,17] (due to the relaxation time of the media) and focus on signal power transformations, while fiber can unlock processing of tens of THz order bandwidth and enables dual-quadrature transformations.
Here we introduce a new approach to reservoir computing exploiting fiber nonlinearity, which makes it possible to receive nonlinear "neuron" functionality via signal-pump beating.Thus, memory is intrinsic to the process and is realized optically.We propose and demonstrate through numerical modeling a novel design of fiber echo state network analogue (FESNA).The advantages of the proposed setup include: (i) NN with intrinsic optical memory (does not require an external memory as in [12]) and (ii) for the first time possibility of optical RNN is shown for high-bandwidth (over 100 GHz compared to previous limits of a few GHz (e.g.[11])) and dual-quadrature signal processing.These features are highly important for increasing the speed and applicability of the optical neuromorphic technology.We demonstrate the feasibility of the proposed system using the examples of two practical applications dealing with predictions and distortion mitigation tasks.Our analysis reveals the device's strong potential for signal processing in high-speed optical communications.The presented all-optical RNN implementation is transparent and flexible and can find a wide variety of applications, ranging from photonics to medicine [18], in particular, RNN for self-tuning lasers [19,20], smart fiber gratings [21], off-the-shelf smart fiber couplers [22], and so on.

Reservoir computing concept
Reservoir computing is a type of recurrent network with randomly connected nonlinear nodes, where only the output weights are updated via training.This significantly simplifies the structure and training time.It has been shown that reservoir computing operation can be realized by a single nonlinear element with a single feedback delay loop [23].
Similar to neural networks, here one operates with nodes, which mimic neurons.The virtual neurons are obtained by sampling the input signal u n .The samples are then combined with random input weights W in .Consequently, each sample u n is transformed into the weighted mix of the sequence W in u n , thus is performed the mapping of the signal into multidimensional space.To introduce memory into the system, the signal is also mixed with the time-delayed feedback.The signal from the feedback loop is the output of nonlinear element, which is also sampled with the same sampling rate -this sequence represents the state of the virtual network x n−1 , which is also mixed through a different weight matrix W The feedback enables the memory in the network, where each state is affected by previous one.The received signal W x n−1 + W in u n represents the signals processed by the neuron n, coming from adjunct synapses with different weights (strength).The nonlinear response of the neuron is modeled by the nonlinear response function y = f (x) (Fig. 1(b).Thus, one receives nonlinear neurons processing input signals.
Meanwhile, the output of the feedback loop is also gradually collected at the receiver X = [...x n , x n+1 , ...].Here, the weights of the matrices W in and W were chosen randomly from the [-0.5 0.5] interval, thus, having uniform distribution around zero on the given interval.While, for processing the real part of the signal, one can use real-valued weight matrices, for processing complex signals the weight matrices were chosen similarly but in the complex domain.Once the weights are chosen, they remain fixed.Overall, the governing equation of the network: where x n is the N-dimensional reservoir state, f is a sigmoid function (usually the logistic sigmoid or the tanh function), W is the N × N reservoir weight matrix, W in is the N × K input weight matrix, u n is the K-dimensional input signal.The training data u are mixed with the mask W in and fed into the reservoir, where it is mixed with building up signal x mixed with a different mask W (Fig. 1(a).
Here and throughout the paper, we focus on achieving nonlinear neural-resembling response -that is, sigmoid activation function -which is typically used in RNNs; therefore, we assume weights (W and W in to be applied electronically, similar to [16] or [24][25][26]).Note, while previous configurations were limited fundamentally by a few orders of GHz [11] due to the relaxation time of the medium, fiber-based technology enables processing of THz bandwidth signals.Therefore, the proposed system is limited only by the electronic weight application, which currently enables the order of hundred GHz signal bandwidth processing.Further work will include modeling weights optically using physical effects in fiber.In the present study, however, we focus solely on realizing nonlinear neural functionality via fiber, thus unveiling a new regime in fiber systems that can be achieved under specific conditions described below and not accessible with traditional techniques.
The reservoir size N is a number of virtual nodes in the reservoir loop -optical feedback loop implemented via fiber.This can be achieved by spatial multiplexing or temporal multiplexing.One can use the temporal multiplexing approach of [11].This causes the delay in the loop, as we feed each symbol of u n for each x n -vector with size of the reservoir.Overall, this realization results in N/N s -delay per symbol, where N s -is the signal sampling rate.This can be avoided by using spatial multiplexing; that is, a multi-core fiber where signals will be spread over N-core fiber.Such an approach will make it possible to avoid delay loop.Further for simplicity we use time delay reservoir with N = 256, which was found to be optimal number for the considered two tasks of prediction and distortion mitigation (classification).
At the final stage all the states of the reservoir are collected at matrix X.Then a simple linear regression or pseudo-inversion can be used to obtain the optimum weights W out ; this can be done offline or online [27].Thus, complex signal processing is reduced to linear operation, while the complexity is transferred to optical domain.
Once the output weights have been identified, the reservoir is tailored for a specific task and can be used in a straightforward manner.Here we will apply it in two examples: a standard prediction task (Mackey-Glass test) and for optical communication processing.In the first instance we use a single-quadrature 100 GHz-bandwidth signal and in the second case we consider multilevel dual-quadrature encoded signals with typical for optical communications channel bandwidth of 30 GHz.This will demonstrate the transparency of the technology to a different bandwidth range.We will also show configurations for one-dimensional (amplitude) and two-dimensional (both quadratures) signal processing (in Fig. 1, panels (c) and (d), respectively), while the underlying principle is the same, using Kerr nonlinearity to achieve nonlinear transformation and pump to select the required quadrature.Note, here we assume that weights and sampling are applied electronically (see [11] for more information), focusing on the optical realization of nonlinear functionality.Fiber dispersion can be used for mixing signals with weights optically in a temporal-multiplexing scenario and fiber core coupling can be used in the spatial-multiplexing case.However, here for the simplicity of presenting the concept, we consider all fibers to have insignificant dispersion; for example, operating near the zero dispersion point.In modern fibers, high nonlinearity and negligible dispersion coefficients can be achieved at the considered span lengths.

Setup for single-quadrature signal processing
The nonlinear sigmoid functionality (function, f ) (Fig. 1(b) can be approximated using the sine function.The latter can be achieved via Kerr nonlinearity in nonlinear fiber 1(c).By coupling the signal with the modulated pump and carefully optimizing fiber parameters, sine transformation can be achieved over the input signal.
For simplicity, let us assume a dispersionless nonlinear fiber, which will result in a strong power-dependent phase shift due to self-phase modulation (SPM) effect.Using dispersionless approximation, we can derive simple analytical expressions.After coupling of the weighted signal and reservoir output, the signal is attenuated (with attenuation coefficient A) to ensure the pump is much stronger than the signal, even for reasonable pump powers.Higher pump power may result in higher-order nonlinear effects; for example, Brillouin or Raman scattering Suppression of Raman and Brillouin scattering (for methods to mitigate such effects, see [28][29][30]).Therefore, we attenuated the signal to use 1W pump power.After the signal is coupled with a pump (ξ): Then χ 3 -nonlinearity in dispersionless media will induce nonlinear phase shift, which is proportional to the selected quadrature of the signal here, γ, L are nonlinear coefficients and fiber length, respectively.In general, the signal might be complex.By modulating the pump we can choose a quadrature of the desired signal.As an example, if one aims to process the real part of the signal, then the pump needs to be modulated as ξ = 1.Thus, in a limit of strong pump, in comparison to the signal, (which can be easily achieved by varying the attenuation factor), we obtain a sine transformation of the signal.The scheme is similar to [31,32].Overall, after nonlinear fiber, the output is For parameters: Although the output is a complex signal, using a modulated strong pump makes it possible to receive output F(x) dependent only on the real value of the input signal.Thus, although x(n) is complex at each stage, it depends only on the real part of the signal at the previous stage and the real part of the input signal.Note that, here, all weight matrices W in , W, W out are real-valued.So, at the next stage, the imaginary part will be annihilated.While at the receiver where the states are collected, one selects only the real component for the matrix X.The resulting curve is plotted in Fig. 1(b), while the transformation from the numerical simulations of the standard off-shelf available highly nonlinear fiber [33], with nonlinear, dispersion, and attenuation coefficients of 10.5 1/W/km, -1.5 ps/nm/km, and 0.8 dB/km, respectively, results in the FESNA-transfer function shown in Fig. 1(b).Here and further in the paper, the numerical simulations were conducted using the standard split-step Fourier method (to ensure high precision we used a variable step size with maximum phase change 0.01).Note, to compensate for high losses in the fiber, the fiber length was optimized, which resulted in the relation |ξ | 2 γL = 2.2π.Overall, the effect of dispersion was negligible, due to low dispersion fiber and low signal power.Note that higher values of nonlinear coefficients are also available, including some twice as high [34].

Setup for dual-quadrature signal processing
In the case of dual-quadrature operation, we use the same concept of the modulated pump to select a quadrature and nonlinear fiber for nonlinear transformation (Fig. 1(d).This can be achieved simultaneously by using the same piece of fiber in a loop configuration by coupling signals with the pumps before they undergo nonlinear transformation.
After mixing, the signal is 6) in this case, matrices W, W in and later W out are complex.The signal is then attenuated n+1 √ A 1 and after a 3 dB coupler two copies of the signal are coupled with the pump.The pump is much stronger than the signal and modulated here as ξ = 1 + i: here CW and CCW stands for clockwise and counterclockwise.After the dispersionless nonlinear fiber, the output is after propagation both signals are coupled back and attenuated here A, γ, L are attenuation and nonlinear coefficients and fiber length; also, for convenience, we chose γL = 2π.Although the output is a complex signal, using the aforementioned transformation one can receive sine-approximated transformation of both quadratures simultaneously, which is a close approximation of ideal sigmoid transformation.Thus, at each stage, both quadratures of the signal in the reservoir x n are processed, while the collected states, signal X, are processed at the receiver side.We found that applying linear regression to augmented matrix X = [X X * ] facilitates the training process and makes it possible to reduce the size of the reservoir.This way, the complex processing is conducted optically, while in the digital domain the complexity is reduced to a simple matrix multiplication.

Application for prediction
To study the performance of the received RC, we used it for the prediction tasks -benchmarking task in machine learning is the Mackey-Glass chaotic time series.We use the Mackey-Glass delay equation Following the example of [11] we chose parameters as τ = 17, α = 0.2, β = 10, γ = 0.1, an integration step of 0.17, downsampled to t s = 3 to obtain the discrete time series used for processing.We used 1000 samples for training and 5000 for testing.The signal bandwidth was chosen to be 100 GHz.The performance comparison is plotted in Fig. 2. One can see that sigmoid, sine, and the generated functions give good performance.We compared the two regimes and found that one-step prediction and feedback generation both demonstrate accurate performance in the considered interval.

Application for distortion mitigation in fiber-optic communication systems.
Next, we study the performance of the designed FESNA for compensating nonlinear signal distortions in fiber-optic communication systems.Performance of fiber-optic communication systems is limited by the efficiency and speed of employed signal processing [35], while new approaches are required beyond the state-of-the-art capabilities [36,37].We used a system with conventional parameters: a single 30 GBaud channel modulated with root-raised cosine pulses having 0.1 roll-off.Here we focus on single channel operation, leaving multichannel regime to further research.To examine the performance of FESNA for compensating deterministic nonlinear effects we study 100 km single span transmission with varied signal power (Fig. 3).The fiber parameters are 17 ps/nm/km dispersion and 1.4 1/W/km nonlinear coefficient.Our focus here is the nonlinear transmission regimes, so we use high input signal power.
We compare FESNA performance with DBP (50 steps per span and 16 samples per symbol) and linear equalization (LE), which compensates for the circular phase shift and other linear distortions.Note that digital back propagation is a standard technique for nonlinearity compensation in optical communication systems; particularly, in the absence of amplifier noise, it enables complete compensation of signal distortions, thus yielding the best performance.On the other hand, linear equalization represents the minimum distortion compensation technique that is typically employed, thus presenting the lowest performance level compensation technique.We compensate dispersion before FESNA (this can be achieved by dispersion-compensating fiber or electronically).We use 1000 symbols for training and 2 15 QAM symbols for testing, taking particular care to ensure random sequence of symbols [38].To quantify the performance, we use Q 2 -factor (Q 2 = 1/EV M 2 , where the error vector magnitude (EVM) is defined as in [39]).
First, we compare performance of FESNA to linear equalizer in Fig. 3(a) for 16-QAM signal.It can be see that the performance of FESNA depends strongly on sampling rate.In particular, for a small sampling rate, such as 4, only linear effects are compensated.Increasing the sampling rate to 16 enables a significant improvement in Q 2 -factor, which grows with signal power.
We then examine compensation of nonlinear effects for higher order QAM.The performance comparison is plotted in Fig. 3(b), which shows that Q 2 -factor improvement is growing for higher nonlinearity.While the value of Q 2 -factor changes for various QAM constellations, the gain roughly remains the same until the nonlinearity is too high for FESNA and the gain decreases.The peak in Q 2 -factor improvement has higher tolerance for higher levels of modulation formats as they are more susceptible to nonlinearity, hence FESNA compensation results in higher gains.Fig. 3(c) shows the corresponding BERs, while the constellation diagrams without and with FESNA are depicted in Fig. 3(d)-(f)) for 16-, 64-, and 256-QAM correspondingly.Overall, the figure illustrates that all-optical neural network realization based on FESNA enables high-efficiency mitigation of linear and nonlinear distortions.

Conclusions
In conclusion, we have proposed a new fiber-optic reservoir computing scheme that for the first time enables dual-quadrature signal processing with over 100 GHz bandwidth.The performance of the reservoir operation with high bandwidth was demonstrated on the standard Mackey-Glass prediction task.We also have shown that the designed FESNA scheme can be used for compensation of nonlinear distortions in fiber-optic transmission systems and can operate with various QAM signals.These presented results offer a novel type of photonic reservoir computing implementation with vast bandwidth range and dual-quadrature signal processing, which significantly broadens the application spectrum of technology.

Fig. 2 .
Fig.2.FESNA for prediction tasks.The test signal is taken from a standard Mackey-Glass test, where we could perform prediction for a signal with 100 GHz bandwidth.For comparison, we plot (and as there is no significant difference visually, we show the mean square error (MSE)) the predicted signal obtained with tanh-function (blue, solid, the mean squared error of test-regime MSE = 4.8 × 10 −9 ), with sin-function (green, squares, MSE = 8 × 10 −7 ), FESNA-generated function (red, diamonds, MSE = 6 × 10 −7 ).Panel (a) depicts one-step ahead prediction (i.e. for test-regime the previous target signal data points are used to generate further one-step ahead prediction) and panel (b) depicts feedback forcing or generation, (i.e. for test-regime the predicted data points are used to generate further predictions).One can see that, in the considered interval, both cases result in highly accurate performance comparable to that of ideal RC; bandwidth of (c) input signal and (d) output signal after FESNA.

Fig. 3 .
Fig. 3. Fiber reservoir computing for 16-,64-, and 256-QAM signal processing.(a) Q 2 -factor for a signal with varied input power for a transmission distance of 100 km, processed with a linear equalizer (LE, which compensates dispersion and phase shift), digital back-propagation (DBP, with 16 samples per symbol and 50 steps per span), and fiber reservoir computing (FESNA, with four and 16 samples per symbol) with reservoir size 2 7 and training on 1000 symbols.Performance of ideal (sigmoid-based) RC is shown for comparison.(b) Q 2 -improvement due to FESNA-processing over linear equalization.The reservoir and signal parameters are the same as in Fig. 2 with a sampling rate of 16.(c) The corresponding BER.(d) 16-, (e) 64-, and (f) 256-QAM modulated signal after linear equalization (blue) and FESNA processing (red) for 10 dBm input power.