Deep photonic reservoir computing recurrent network

Deep neural networks usually process information through multiple hidden layers. However, most hardware reservoir computing recurrent networks only have one hidden reservoir layer, which significantly limits the capability of solving real-world complex tasks. Here we show a deep photonic reservoir computing (PRC) architecture, which is constructed by cascading injection-locked semiconductor lasers. In particular, the connection between successive hidden layers is all optical, without any optical-electrical conversion or analog-digital conversion. The proof of concept is demonstrated on a PRC consisting of 4 hidden layers and 320 interconnected neurons. In addition, we apply the deep PRC in the real-world signal equalization of an optical fiber communication system. It is found that the deep PRC owns strong ability to compensate the nonlinearity of fibers.


INTRODUCTION
Deep neural networks with multiple hidden layers have been substantially advancing the development of artificial intelligence.In comparison with the digital electronic computing based on the von Neumann architecture, optical computing can boost the energy efficiency while reduce the computation latency [1][2][3].In recent years, a large variety of optical computing architectures have been proposed, and most focused on the linear multiply-accumulation operation [4][5][6][7][8].Together with the nonlinear activation function in the digital domain, optical convolutional neural networks and multilayer perceptrons have been extensively demonstrated.In contrast to the above two feedforward neural networks, recurrent neural networks (RNN) have inherent memory effect and are favorable for solving time-dependent tasks such as natural language processing and temporal signal processing [9].Reservoir computing (RC) is such a kind of RNN, but with fixed weights in the input layer and in the hidden reservoir layers [10,11].Only weights in the readout layer require training, which leads to a simple training algorithm and a fast training speed.Optoelectronics-based [12][13][14] and memristor-based [15][16][17] RCs have been intensively investigated, while various types of hardware RCs have been discussed as well [18].However, most hardware RCs only have one hidden reservoir layer, which substantially limits the capability of dealing with real-world problems.A comprehensive theoretical analysis by C. Gallicchio et al. has pointed out that the deep hierarchy of RCs owned multiple time scales and frequency components, and thereby boosted the richness of dynamics and the diversity of representations [19,20].Several paradigms of combining multiple reservoirs have been theoretically compared in literatures, and it was found that a unidirectional coupling scheme of hidden reservoirs was beneficial to improve the performance of RCs [21,22].Indeed, the deep configuration raises both the linear and the nonlinear memory capacities of RCs [23,24].Interestingly, Penkovsky et al. showed that a deep RC with time-delay loops was equivalent to a deep convolutional neural network [25].In experiment, Nakajima constructed a deep RC based on a Mach-Zehnder modulator associated with an optoelectronic feedback loop [26].However, there is only one piece of hardware, which is reused in each hidden layer.The interconnection between successive layers requires opticalelectrical conversion (OEC), analog-digital conversion (ADC), as well as the inverse conversions.The above four conversion processes consume high power and introduce a large amount of latency, which significantly counteract the merits of optical computing.Lupo et al. recently proposed a two-layer RC based on two groups of frequency combs, which were produced by the phase modulation of light [27].The interconnection between the two layers is implemented in the electrical domain through the OEC.Nevertheless, the scalability of the RC depth is limited by its tradeoff with the width.This work presents a deep PRC based on cascading injection-locked semiconductor lasers.The hidden-layer interconnections are fully optical without any OEC and ADC.The deep PRC architecture with 4 hidden layers and 320 neurons is successfully demonstrated in experiment.In particular, the PRC depth is highly scalable without any power and coherence limitation.The deep PRC is applied in the signal equalization of an optical fiber communication system.It is proved that the deep PRC has strong ability to mitigate the Kerr nonlinearity of optical fibers, and hence to improve the signal quality at the optical receiver.

DEEP PRC ARCHITECTURE AND EXPERIMENTAL SETUP
Figure 1(a) illustrates the architecture of the deep PRC.A single-mode master laser uni-directionally injects into the slave laser (Laser 1) in the first hidden layer of the reservoir.The optical injection is operated in the stable regime, which is bounded by the Hopf bifurcation and the saddle-node bifurcation [28,29].Partial light of Laser 1 goes to the second layer of the reservoir and locks Laser 2 through optical injection.In the same way, Laser 2 locks Laser 3 in the third layer, and then Laser 3 locks Laser 4 in the fourth layer.
As a result, the lasing frequencies of all the four slave lasers are locked to be the same as that of the master laser.Besides, the phases of all the slave lasers are synchronized with the master laser as well.In each hidden layer, the laser is subject to an optical feedback loop, which produces a large number of virtual neurons through nonlinear laser dynamics [14,30].The optical feedback is also operated in the stable regime, which is separated from the unstable regime through a critical feedback level [28,31].In the input layer, the input signal is multiplied by a random mask, and this pre-processed signal is superimposed onto the carrier wave of the master laser through an optical modulator.The masking process plays a crucial role in the PRC system.On one hand, the fast varying mask sequence maintains the instantaneous state of all the time-delay reservoirs [30,32].On the other hand, the mask interval defines the interval of virtual neurons.The neuron number in each hidden layer is determined by the clock cycle divided by the neuron interval.
In the readout layer, the neuron states in all the four hidden layers are tracked simultaneously.The target value is obtained through the weighted sum of all the neuron states, and the weights are trained through the algorithm of ridge regression [32].Based on the deep PRC scheme, Fig. 1(b) shows the corresponding experimental setup.A tunable external cavity laser (Santec TSL-710) serves as the master laser, and its output power is amplified by an erbium-dope fiber amplifier (EDFA).The polarization of the light is aligned with a Mach-Zehnder intensity modulator (EOSPACE, 40 GHz bandwidth) through a polarization controller.The input signal is multiplied by a random binary mask consisting of {1, 0}.This pre-processed signal is generated from an arbitrary waveform generator (AWG, Keysight 8195A, 25 GHz bandwidth), which then drives the modulator.The polarization of the modulated light is re-aligned with the polarization of the slave laser in the first hidden layer.The four slave lasers in the hidden layers are commercial Fabry-Perot lasers with multiple longitudinal modes.In each layer, the optical feedback loop is formed by an optical circulator and two 90:10 couplers.The feedback strength is adjusted by an optical attenuator.At the output of each hidden layer (except the fourth layer), 70% light is uni-directionally injected into the subsequent layer to lock the slave laser, and the polarization of the light is re-aligned.Between the second and the third layers, the laser power is amplified by using another EDFA.The neuron states of all the four layers are detected by broadband photodiodes (PD), and then recorded on the four channels (Ch) of a high-speed digital oscilloscope (OSC, Keysight DSAZ594A, 59 GHz bandwidth), simultaneously.The optical spectrum is measured by an optical spectrum analyzer with a resolution of 0.02 nm (Yokogawa).In the experiment, the time interval of neurons in each hidden layer is fixed at θ=0.05 ns, which is determined by the modulation rate of the optical modulator at 20 Gbps.The number of neurons in each layer is set at N=80, resulting in a total neuron number of 320 in the deep PRC of four hidden layers.Consequently, the clock cycle of the PRC system is Tc =4.0 ns (Tc=θ×N).The sampling rate of the AWG is 60 GSa/s and the rate of the OSC is 80 GSa/s, respectively.

EXPERIMENTAL RESULTS
In the experiment, all the four FP lasers in the hidden layers exhibit an identical lasing threshold of Ith=8.0 mA.The pump currents and the corresponding output power of all the lasers are listed in Table 1, respectively.The delay times of the four optical feedback loops are fixed in the range of 63 to 68.5 ns, without any optimization.It is stressed that the delay times are more than 15 times longer than the clock cycle of the computing system, unlike the common synchronous case.Our recent work has proved that this asynchronous architecture is helpful to improve the PRC performance [29,33], owing to the rich neuron interconnections [34].The feedback ratio is defined as the power ratio of the reflected light to the emitted light, which is set around -30 dB for all the four layers.
The critical feedback level of the lasers is about -19.3 dB, and hence the optical feedback is operated in the stable regime.The injection ratio is defined as the ratio of the injected power from the laser in the previous layer to the emission power of the laser in the subsequent layer.As shown in Table 1, the injection ratios of each layer vary from about 2.0 up to 4.0.In addition, the detuning frequency is defined as the lasing frequency difference between the two lasers.All the detuning frequencies in Table 1 are set within the stable locking regime without optimization.Figure 1 shows the optical spectra of the FP lasers of multiple longitudinal modes in all the four reservoir layers.The spectrum peaks of the lasers are around 1550.98, 1542.63,1548.91, and 1540.86 nm, respectively.Meanwhile, the free spectral ranges are 154.6,154.8, 172.7, and 171.9 GHz, respectively.When applying optical injection from the master laser at 1546.5 nm, only one mode of the slave lasers closest to the injection wavelength is locked in the stable regime.All side modes are suppressed and the suppression ratio is more than 50 dB.This is because the optical injection reduces the gain of the laser medium [35].The performance of the deep PRC is tested in the real-world task of nonlinear channel equalization in optical fiber communications.
The optical signal in optical fibers is distorted by the linear chromatic dispersion and the nonlinear Kerr nonlinearity [36].The linear distortion is usually mitigated by the feedforward equalizer (FFE) in the digital signal processing (DSP) of the optical receiver [37,38].
On the other hand, the Kerr nonlinearity can be compensated by solving the nonlinear Schrödinger equation [36].However, common solving algorithms like the digital back propagation are too complex for the DSP implementation [38,39].An alternative solution is deploying neural networks to compensate the fiber nonlinearity with reduced computational complexity [39][40][41].In particular, several literatures have experimentally demonstrated that shallow PRCs were capable to compensate the linear impairments of optical fibers instead of FFEs [33,[42][43][44][45].Here we show that the deep PRC has strong ability to mitigate the nonlinear impairments of optical fibers.The nonlinear Schrödinger equation describing the propagation of light in an optical fiber reads [36]: 22 where E(z,t) is the slowly varying envelope of the electric field, z is the transmission distance (50 km), α is the attenuation constant (α=0.2 dB/km), β2 is the fiber dispersion coefficient (-21.4 ps 2 /km), and γ is the fiber nonlinearity coefficient (1.2 /(W• km)) [46].
The signal under investigation is a non-return-to-zero (NRZ) signal with a modulation rate of 25 Gbps.The training set consists of 35000 random symbols of {0, 1}, and the testing set consists of 15000 symbols.Each symbol consists of 8 samples, and the tap number of the nonlinear equalizer is set at 21.In the experiment, each measurement is repeated four times, and the mean bit error rate (BER) and the standard deviation are recorded.
Figure 3(a) shows an example of the random NRZ signal sequence sent at the transmitter, with a launch power of 4.0 mW.After a transmission distance of 50 km, nevertheless, the signal received at the receiver in Fig. 3(b) is substantially distorted.Generally, increasing the launch power raises the nonlinear effect, and the signal distortion becomes stronger [36].The task aims to reproduce the original signal in Fig. 3(a) based on the degraded one in Fig. 3(b).When applying the shallow 1-layer PRC to equalize the received signal in Fig. 3(c), the BER firstly decreases from 5.0×10 -3 at 1.0 mW down to the minimum value of 3.4×10 -3 at 100 mW.Above 100 mW, the BER increases with the launch power nonlinearly.Meanwhile, the BERs for launch powers ranging from 80 to 120 mW are below the hard-decision forward error correction (FEC) threshold (3.8×10 -3 , dashed line) [47].This is because the PRC inherently owns both linear memory effect and nonlinear memory effect, which are commonly quantified by the linear memory capacity (MC) and nonlinear MC, respectively [24,32].For low launch powers (see 1.0 mW), the Kerr nonlinearity of the optical fiber is negligible and the signal distortion is mainly induced by the linear chromatic dispersion.Therefore, the impairment compensation only requires the linear memory effect of the PRC, while the nonlinear memory effect plays a negative role.When increasing the launch power (see 20-100 mW), the Kerr nonlinearity appears and hence the nonlinear memory effect of the PRC becomes beneficial to mitigate this nonlinear distortion.The BER reaches the minimum value when the inherent nonlinear memory effect of the PRC matches with the strength of the Kerr nonlinearity of the fiber (see 100 mW).On the other hand, the BER increases when the nonlinear memory capacity is not high enough to compensate the strong fiber nonlinearity (see 120-200 mW).Therefore, the nonlinear equalization ability of the PRC is limited by its maximum nonlinear memory effect.For the deep PRC with two reservoir layers, the BER reduces from 4.4×10 -3 at 1.0 mW down to the minimum of 1.5×10 -3 at 120 mW.The BERs for launch powers ranging from 20 to 160 mW are below the FEC threshold.The PRC performance further improves when we increase the PRC depth to three.It is shown that the corresponding BER declines from 4.2×10 -3 at 1.0 mW down to the minimum of 1.0×10 -3 at 120 mW.The BERs of the 3-layer PRC are better than those of the 2-layer PRC, for all the studied launch powers ranging from 1.0 mW up to 200 mW.However, the performance of the 4-layer PRC is similar to or slightly worse than that of the 3-layer PRC.This suggests the PRC performance saturates at the depth of three, for this nonlinear signal equalization task.In comparison with the shallow PRC, all the three deep PRCs exhibit better performance for the whole launch power range.In particular, the BERs are significantly reduced in the power range of 80 to 160 mW.Therefore, unlike the shallow PRC, the deep PRCs have very strong ability to mitigate the nonlinearity of optical fibers and hence to improve the transmission signal quality.This compensation ability can be attributed to the strengthened nonlinear memory effect of the deep PRCs, which is discussed in the next section.Figure 4 explores the contribution of each reservoir layer in the 3-layer PRC.For each evaluation, only one of the three reservoirs is used for the signal equalization.Therefore, the virtual neuron number becomes 80 instead of 240, both for the training and for the test.It is found that the performance of the second-layer reservoir is generally better than the first-layer one.In particular, the minimum BER of the second-layer reservoir achieved at 120 mW is 2.0×10 -3 .This value further goes down to 1.4×10 -3 for the thirdlayer reservoir, which is 2.6 times smaller than the first-layer case (3.7×10 -3 ).The different performance of the three hidden layers suggests that the neuron dynamics are different from one layer to another.Generally, the neuron states at the deeper layer are richer than those at the shallower one, which results in the better performance in the former case.This behavior is different to the parallel PRC, where several reservoirs are connected in parallel instead of in series.Our recent experimental work demonstrated that the neuron states in every parallel reservoir were similar to each other [33].Owing to the rich neuron dynamics in each layer, the nonlinear memory effect in the deep PRC is improved, and thereby the performance of the 3-layer PRC is boosted.In comparison, the FFE commonly used in the DSP of optical receivers only compensates the linear chromatic dispersion of optical fibers [37].Figure 4 shows that the BER of the FFE increases nonlinearly from 5.2×10 -3 at 1.0 mW down to the minimum of 3.8×10 -2 at 200 mW.For the launch power of 120 mW, the BER of the FFE (7.7×10 -3 ) is 7.7 times larger than that of the 3-layer PRC (1.0×10 -3 ).This comparison proves that the deep PRC can indeed compensate strong nonlinearity of optical fibers.For low launch powers (1 mW), nevertheless, the BER of the 3-layer PRC is only slightly better than that of the FFE.This suggests the deep PRC has similar compensation ability of chromatic dispersion as the FFE.

DISCUSSION
The experimental results in Fig. 3 and in Fig. 4 have shown that raising the depth of the PRC substantially improves the performance of nonlinearity compensation at high launch powers.However, the deep PRC shows similar performance of linearity compensation as the shallow PRC at low launch powers.In order to understand the behavior, we numerically analyze both the linear MC and the nonlinear MC of the PRC, respectively.The deep PRC model includes four hidden reservoir layers as in the experiment.We assume that the slave lasers in the four layers are all identical to simplify the simulation.The carrier dynamics, the photon dynamics, and the phase of the electrical field are taken into account through the framework of rate equations.Both the optical feedback effect and the optical injection are characterized through the classical Lang-Kobayashi model [48,49].The main simulation parameters are listed in Table 2.The detailed deep PRC model and other simulation parameters refer to [24].The linear MC (LMC) measures the ability of the PRC of reproducing the past input signal, which is quantified by [50,51]: where the input signal u(k) is a random sequence uniformly distributed in the range of [-1, 1].y(k) is the corresponding output of the PRC at the step k.The aim of the evaluation is to reproduce the input signal u(k-i) shifted i-step backward using y(k). 2    represents the variance operation and <> stands for the average operation.On the other hand, the nonlinear MC characterizes the ability of reproducing high-order Legendre polynomials of the input signal, which is defined as: where the polynomial is p(k)=[3u 2 (k)-1]/2 for the quadratic MC (QMC), and is p(k)=[5u 3 (k)-3u(k)]/2 for the cubic MC (CMC), respectively.The aim of the evaluation is to reproduce the polynomial p(k-i) using the PRC output y(k).In addition to LMC, QMC, and CMC, the PRC also has higher-order memory effect and cross memory effect, which are not considered in this work.Figure 5 shows that both the linear MC and the nonlinear MCs rise with the increasing depth of the PRC.The LMC increases from 8.47 for the 1-layer PRC up to 16.87 for the 4-layer PRC.However, the deep PRC in Fig. 3 only slightly reduces the BER at low launch powers.This suggests the LMC of the shallow PRC is already high enough for compensating the chromatic dispersion.On the other hand, the nonlinear QMC increases from 5.25 to 11.27, while the CMC increases from 3.33 to 6.57.The enhanced nonlinear MC can be attributed to the rich neuron states of deep reservoir layers as proved in Fig. 4. As a result, the deep PRC exhibits strong ability in mitigating the nonlinearity of optical fibers.On the other hand, all the three MCs almost saturate as the depth of three, resulting in the performance saturation of the nonlinear signal equalization in Fig. 3.

CONCLUSION
In summary, we have experimentally demonstrated a deep PRC architecture based on the cascading injection-locked Lasers.The connection between successive reservoir layers is all optical, without any OEC or ADC.In addition, this scheme is highly scalable because the laser in each layer provides optical power.The deep PRC with a depth of four is used to solve the real-word problem of nonlinear signal equalization of optical fibers.It is proved that the deep PRC exhibits strong ability to compensate the Kerr nonlinearity of optical fibers and hence to improve the quality of the received signal.In comparison with the linear FFE, the deep PRC reduces the

Fig. 1 .
Fig. 1.(a) Schematic architecture of the deep PRC.(b) Experimental setup of the deep PRC.AWG: arbitrary waveform generator; OSC: oscilloscope; PD: photodiode; EDFA: erbium-dope fiber amplifier; Ch: channel.The hidden layers are interconnected by the optical injection.The optical feedback loops provides virtual neurons.

Fig. 2 .
Fig. 2. Optical spectra of the four FP lasers with and without optical injection.

Fig. 3 .
Fig. 3. Time sequences of the signal at (a) the transmitter and (b) the receiver.The launch power is 4.0 mW.(c) Performance of the PRCs with different depth.The error bar stands for the standard deviation of the measurement.The dashed line indicates the FEC threshold.

FECFig. 4 .
Fig. 4. Performance comparison between the 3-layer PRC (dots) and the FFE (squares).The open symbols represent the BERs of each reservoir layer, respectively.The dashed line indicates the FEC threshold.

Fig. 5 .
Fig. 5. Memory capacity of the PRCs with different depth.