Efficient Reservoir Computing using Field Programmable Gate Array and Electro-optic Modulation

We experimentally demonstrate a hybrid reservoir computing system consisting of an electro-optic modulator and field programmable gate array (FPGA). It implements delay lines and filters digitally for flexible dynamics and high connectivity, while supporting a large number of reservoir nodes. To evaluate the system's performance and versatility, three benchmark tests are performed. The first is the 10th order Nonlinear Auto-Regressive Moving Average test (NARMA-10), where the predictions of 1000 and 25,000 steps yield impressively low normalized root mean square errors (NRMSE's) of 0.142 and 0.148, respectively. Such accurate predictions over into the far future speak to its capability of large sample size processing, as enabled by the present hybrid design. The second is the Santa Fe laser data prediction, where a normalized mean square error (NMSE) of 6.73x10-3 is demonstrated. The third is the isolate spoken digit recognition, with a word error rate close to 0.34%. Accurate, versatile, flexibly reconfigurable, and capable of long-term prediction, this reservoir computing system could find a wealth of impactful applications in real-time information processing, weather forecasting, and financial analysis.


Introduction
Modern computers based on von Neumann architectures have been designed for digital information processing and generic computational tasks there upon.With more and more problems being identified as computationally complex and/or excessively time consuming, alternative computational paradigms are actively explored to solve these problems as they increasingly emerge [1][2][3][4][5].To that end, a promising approach is to develop brain-inspired architectures for information processing, including a variety of neural networks [6][7][8][9].Yet, those architectures are still implemented through computer simulations on Von Neumann computers, thus fundamentally subject to the latter's limitations in speed, parallelism, etc.
Recently, there has been increasing interest on artificial neural networks using optics, leveraging its remarkable speed, multiplexing capability, and little heat deposition [10,11].Usually, these optical neural networks are wholly trained with a known data set to optimize their connectivity and parameters through nonlinear layers [12,13].However, such training is usually energy and time consuming, and its efficiency varies by the complexity of task, the size of the network, the nonlinearity and connectivity between the nodes.Also, it is conducted offline using digital computers, thus failing to account for the uncertainties and fluctuations that are inevitable in any optics systems, especially when the optical networks are complex and involve many free-space parts [14,15].
In an effort to address the above difficulties, the idea of reservoir computing (RC) was proposed and widely explored [16][17][18][19][20]. RC origins from liquid-state machine (LSM) [21] and echo state networks (ESN) [22].Its realizations are generally composed of three parts: an input layer, a reservoir layer, and an output layer.The input signals are first fed into the input layer, then mapped to the reservoir layer, which contains a large number of interconnected nonlinear nodes and performs nonlinear transformation.Afterwards, response is readout with linear weighted sum of the reservoir states in the output layer.Unlike other kinds of recurrent neural networks that are notoriously hard to be trained, here only the output weights are needed to be optimized, making the training much simpler.It is this advantage and its success on lots of time dependent tasks, such as chaotic time series prediction and speech recognition [23][24][25][26][27][28][29][30][31][32][33][34][35], that draw much attention across different application areas.The first RC system was implemented using analog electronics with a single nonlinear node and delayed feedback [17].Since then, digital electronics are increasingly adopted [36][37][38].Among them, field programmable gate array (FPGA) electronics can make the system compact, stable, low-cost and easily configurable with various commercial systems for real-time information processing.More recently, hybrid opto-electronic feedback systems with FPGA's have reformulated the development of a coherent Ising machine for solving computationally hard optimization problems [39] and to build a large networks of identical nodes with arbitrary topology for cluster synchronization, chimera states [40] and laminar chaos [41].
Here, we add to the exciting progress made in the RC field and experimentally demonstrate a fully-packaged opto-electronic RC system consisting of a Mach-Zehnder electro-optic modulator (EOM) and a FPGA circuit.In our design, both the delay line and filters are implemented digitally within FPGA, which renders the whole system compact and immune to optical drifts and noise, especially compared to fiber-optical realizations [42][43][44].Leveraging the filters inside the FPGA, we are able to achieve more dynamics and connections between reservoir nodes.To characterize the system, we run three benchmark tasks: the 10  ℎ order Non-Linear Auto-Regressive Moving Average test (NARMA-10), the Santa Fe laser data prediction, and the isolate spoken digit recognition.All exhibit exceptionally high performance, which indicates the robustness and versatility of this RC system.
The remainder of the paper is organized as follows.Section 2 discusses the basics theory of opto-electronic RC system using delay feedback.In Section 3, we show the experimental setup which covers detailed block level FPGA implementations.Subsequently, Section 4 shows the experimental results to discuss three benchmark tests for evaluating the performance of RC system and finally, Section 5 concludes this paper.

Theory
Figure 1(a) shows the conventional RC model which consists of an input, a reservoir, and an output layer.The input layer consists of input vector u() of length  which are fed into the reservoir via fixed but random input weighted connection.These weights will scale the input differently for different reservoir nodes.The reservoir layer contains a large number of recurrent and randomly interconnected nonlinear nodes.Reservoir layer non-linearly projects the inputs to high-dimensional state space.The dynamics of the reservoir also exhibit a fading memory where the current reservoir state is influenced only by the recent past.The dynamics of states in the reservoir layer is given by where    is the non-linear function,   () is the reservoir state of   ℎ node at time ,   is the nodes inter-connection matrix,    is the input weight matrix and   () is the   ℎ input.At the output layer, the response is readout by linear weighted sum of node states which where    being the output weights.The output weights    are updated after training to make the outputs ŷ  () as close as possible to the target values   () using optimization methods such as linear regression, ridge regression etc. Above RC architecture requires very large number of interconnected physical nodes.In contrast, the time-delay based RC uses virtual nodes that are temporally spaced and has only one nonlinear node as shown in Fig. 1(b).The non-linearity is implemented electro-optically, using an EOM.The input is preprocessed before injecting into nonlinear node, this procedure is referred as masking.The preprocessed input signal is time-multiplexed and injected serially into reservoir.The input vector after preprocessing will resides in total delay time of .The temporal spacing between N nodes is given by  = / as shown in Fig. 1(b).Figure 2 shows the schematic of our digital opto-electronic RC implementation of delay feedback system.The optical power at the output of EOM is given by a transfer function where () is the RF signal applied to EOM,   is the -shift voltage and  is the bias offset.The photo-detector generates voltage in response to the applied laser power which is given by where  is the total insertion loss of the EOM,  is the responsivity of the detector and   is transimpedance gain of the amplifier.The RF input to EOM is where  is the overall forward gain of the system, is the input scaling factor and () is the input to the reservoir.() is the output of the filter after delay, which is convolution of the delayed detector output and filter impulse response function ℎ(), as with  the total delay.Taking all into account, the equation of the states reads with () = ()/2  ,  =   | 0 | 2 /2  , and  = /2  .It is then discretized as where  = / is discrete sample obtained with sampling period  and  is integer multiple  of .For digital filter, ℎ[] has a finite length of  with  ≤ .Thus, Equation ( 8) is in an explicit form of From Eq. ( 9), we can see that the states are coupled using filter coefficients.Digital filter gives the flexibility to implement different network topology [42].

Experimental Setup
The block diagram of our experiment setup is depicted in Figure 3(a).A fiber-coupled laser diode (JDS Uniphase CQF975/58) provides a continuous wave beam at ∼1550 nm, which is passed through a fiber polarization controller (FPC) before coupling to an EOM (Lucent Technologies, 2623NA).The EOM takes RF input signal from the arbitrary waveform generated using RF Digital to Analog converter (DAC) and modulates the laser beam intensity.The modulated intensity is converted to electrical signal using a photodetector (Thorlabs, DET10C2), whose output is digitized using an analog to digital converter (ADC) with 1 Msps sampling rate controlled by FPGA.A Personal computer (PC) provides the preprocessed data via Ethernet interface for this RC system.Figure 3(b) shows the FPGA blocks implemented on a Zedboard (Zynq-7000) development platform.The RC logic is implemented using Verilog programming language.The FPGA interfaces to ADC (Maxim Integrated MAX11198, 16bits) and RF DAC (Analog Devices D5541A,16bits) using Serial Peripheral Interface.The PC preprocess the input signal using mask and input scaling factor, which is then sent to a Zedboard via Ethernet interface.Processing System (PS) in Zedboard shares data with Direct Memory Access (DMA) to stream the input to the FPGA logic.The feedback signal from filter block and the streamed input are added and passed through programmable gain and offset block before converting to electrical signal using DAC.Here, the gain and offset are key parameters for tuning RC system performance.Similarly streaming data from ADC is fed into programmable delay block with delay limited to 1000 units, each delay representing 1 virtual node spacing.The delayed signal is then passed to finite digital bandpass filter which has 400 filter taps.The filtered data implements the Eq. ( 9) which has state information.PS will then transmit this data to PC via Ethernet interface.Calibration DAC (Maxim Integrated MAX5316, 16bits) is used to compensate for the bias drift of EOM.

NARMA10
NARMA is one of the most widely used benchmarks in RC [45].NARMA10 is a discrete time nonlinear task with 10  ℎ order lag.Series  +1 is generated through a recursive formula The input   is drawn from a uniform distribution in the interval [0, 0.5].Due to non-linearity and long time lag, NARMA10 poses a challenge for any computation system.
To characterize the performance of the RC, a normalized root mean squared error (NRMSE) between the target and predicted value is calculated as where   is the target, ŷ is the prediction,  is the total number of samples in the target and  denotes the standard deviation of the target.By sweeping the system parameters, the optimum operating point is identified at gain  = 0.58, input scaling  = 0.5, bias  = 0.1, total of  = 400 virtual nodes.The results of benchmark are tabulated in Table 1. Figure 4 plots the target vs prediction for 2000 samples.Also plotted is the residue R, defined as the difference between target and estimate normalized by the mean of target As seen, the present RC performs remarkably well over long term prediction.For 25000 samples with training size of 5200, the mean NRMSE is 0.148.

Santa Fe laser data prediction
The data set A from the 1994 time series prediction competition organized by the Santa Fe Institute was an time series obtained from measuring a NH3 laser and it is good example of realistic data [46].Santa Fe laser data prediction is one step series prediction and we use 4000 data points in our test case.The performance is measured using NMSE.By sweeping the system parameters, the optimum operating point is identified at gain  = 0.58, input scaling  = 0.0029, bias  = 0.1, total of  = 400 virtual nodes for a data points of 4000.Results of the experiment is listed in Table 2. Figure 5 shows plots of the target and prediction, along with the Residue for the first 1000 samples.

Isolate spoken digit recognition
Speech recognition is a commonly used benchmark for testing the performance of neural networks.Acoustic feature are more pronounced in frequency domain compared to time domain.Hence the input audio data are first pre-processed by decomposing the time-domain information into frequency-time information.In our implementation we are using Lyons cochlear ear model to get the cochleagram which mimics filtering that occurs in nature [47].
The input audio data for this experiment is from AudioMNIST database [48].This is a free and open database that contain 30,000 spoken digits(0-9) audio samples of 60 different speakers with a sampling rate of 48 kHz.To use these audio files with our RC, we first down-sample the audio to 12 kHz using the librosa package [49] and pad audio files with random length zero at beginning and end to make total sample size of 12, 000.Next we generate the cochleagram as shown in Fig. 6(b), which is calculated using "lyon 1.0.0"python package.In this test, we consider first 5 speakers from the AudioMNIST database to create 20 balanced subset, each containing 5 speaker × 10 digit × 2 utterance each, i.e. 100 audio sample.The training and testing dataset will contains 5 such subset with the first 4 used for training and the last one for testing.In-order to obtain unbiased result, 50 datasets are formed using random combinations of 5 subsets from 20 subsets and the word error rate (WER) is measured for each combinations.
Figure 6 illustrate the flow for spoken digit recognition task.The input mask is a real valued matrix of dimension  ×  ℎ with uniform distribution in [-1,1], where  is number of virtual nodes and  ℎ is the number of cochleagram channels.matrix.The Output class is predicted by getting the maximum argument corresponding to row-wise mean of estimation matrix [50].
After tuning the system parameters and based on their error evaluations, we found the optimum operating point at gain  = 0.5, input scaling  = 300, bias  = 0.35, total of  = 400 virtual nodes.Results of experiment are listed in table 3.

Conclusions
We have experimentally demonstrated an opto-electronic RC system using EOM and FPGA.It takes advantage of electro-optic nonlinear transformation in EOM, and flexible signal generation and controllable timing, highly programmable signal filtering, and stable delay logic implementation in FPGA.The resultant system is a stable and fully functional reservoir computer supporting many nodes, high connectivity, and easily configurable for multifaceted tasks, while allowing online training.We have tested it with three benchmark tasks: NARMA-10, SantaFe laser prediction, and Isolate spoken digit recognition.It achieves 0.142 NRMSE for NARMA-10, 6.73×10 −3 NMSE for SataFe, and a WER ∼ 0.34% for the speech recognition.These results are compared with the state of the art in Table 4.As seen, the present RC system tops the prediction accuracy in the first two test, and is the only reported system to perform well in all three tasks.Moreover, it is able to precisely predict 25,000 steps in NARMA-10 series with impressive low NRMSE of 0.148.Such high performance in those various benchmark tests clearly demonstrate the advantages of our system for complex and versatile tasks.
Our system is currently working at a low speed, which is mostly limited by the sampling rate of ADC and settling time of DAC.A significant speedup is expected by choosing a faster ADC and DAC.Also, it is prospective to replace the existing bulk-optical EOM with photonic integrated chips, where sophisticated and tailored nonlinear transformations can be realized using nested optical circuits on a single chip [56,57].Finally, an FPGA-based opto-electronic RC system of this or similar design are easily re-configurable to accommodate even more neurons, high connectivity, and arbitrary topology.Also, the performance and robustness of the present RC could be further improved by online training.Those upgrades will invite important applications in weather and financial forecasting, real-time information processing, and so on.

Disclosures
Disclosures.The authors declare no conflicts of interest.

Fig. 3 .
Fig. 3. (a) Experimental setup for the present opto-electronic RC and (b) the block diagram of the hardware implementation using FPGA board, and ADC/DAC.

Fig. 4 .
Fig. 4. NARMA10 benchmark results: (a) shows the amplitude of the target (red line) and the predicted signal (green line) vs index of 2000 samples and (b) shows the normalized residue of target and predicted signal.

Fig. 5 .
Fig. 5. Santa Fe benchmark results: (a) shows the amplitude of the target (red line) and the predicted signal (green line) as a function of sample size and (b) shows the normalized residue of target and prediction.

Figure 6 (
b) shows the cochleagram for digit 9 with dimension  ℎ ×   where   is the index of new time representation that depends on decimation factor.As shown in Fig. 6(c), the resultant product of input mask and cochleagram is serially injected into reservoir by flattening the matrix.The output of the reservoir is de-serialized to get the reservoir state matrix of dimensions  ×   , as shown in Fig. 6(e).The estimation matrix is calculated by multiplying the output weight matrix with reservoir state

Fig. 6 .
Fig. 6.Graphical illustration of isolated spoken digit recognition task.(a) Uniformly distributed input mask with values in range [-1,1] and dimension x ℎ where  = 400 Nodes and  ℎ = 77 (b) Cochleagram generated from audio file of dimension  ℎ x  corresponding to digit 9. (c) Resultant product of input mask and cochleagram is serialized and injected into reservoir.(d) Output layer weights of dimension   x where number of output classes is   = 10 (e) The output from reservoir is serially captured and reshaped to x  , this will give node matrix.(f) Product of output weight with reservoir state matrix give estimation matrix.Output class is predicted by getting the maximum argument corresponding to row-wise mean of estimation matrix.