Reinforcement Learning in a large scale photonic Recurrent Neural Network

Photonic Neural Network implementations have been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic Neural Networks with numerous nonlinear nodes in a fully parallel and efficient learning hardware was lacking so far. We demonstrate a network of up to 2500 diffractively coupled photonic nodes, forming a large scale Recurrent Neural Network. Using a Digital Micro Mirror Device, we realize reinforcement learning. Our scheme is fully parallel, and the passive weights maximize energy efficiency and bandwidth. The computational output efficiently converges and we achieve very good performance.

Photonic Neural Network implementations have been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic Neural Networks with numerous nonlinear nodes in a fully parallel and efficient learning hardware was lacking so far. We demonstrate a network of up to 2500 diffractively coupled photonic nodes, forming a large scale Recurrent Neural Network. Using a Digital Micro Mirror Device, we realize reinforcement learning. Our scheme is fully parallel, and the passive weights maximize energy efficiency and bandwidth. The computational output efficiently converges and we achieve very good performance.

I. INTRODUCTION
Multiple concepts of Neural Networks have initiated a revolution in the way we process information. Deep Neural Networks outperform humans in challenges previously deemed unsolvable by computers [1]. Among others, these systems are now capable to solve non-trivial computational problems in optics [2]. At the same time, Reservoir Computing (RC) emerged as a Recurrent Neural Network (RNN) concept [3]. Initially, RC received substantial attention due to excellent prediction performance achieved with minimal optimization effort. However, quickly it was realized that RC is highly attractive for analogue hardware implementations [4,5].
As employed by the machine learning community, Neural Networks (NNs) consist of a large number of nonlinear nodes interacting with each other. Evolving the NNs' state requires performing vector-matrix products with possibly millions of entries. Neural Network concepts therefore fundamentally benefit from parallelism. Consequently, photonics was identified as an attractive alternative to electronic implementation [6,7]. Early implementations were bulky and suffered from lack of adequate technology and NN concepts. This recently started to change, firstly because RC enabled a tremendous complexity-reduction of analog electronic and photonic RNNs' [5,[8][9][10][11]. In addition, integrated photonic platforms have matured and integrated photonic Neural Networks are now feasible [12]. Various demonstrations how a particular network of neurons can be implemented have been realized in hardware. Yet, Neural Networks consisting of numerous photonic nonlinear nodes combined with photonically implemented learning were so far only demonstrated in a delay systems controlled by a Field Programmable Gate Array [13]. Due to the timemultiplexing, delay system NNs fundamentally require such auxiliary infrastructure and computational speed suffers due to their serial nature.
While networks with multiple nodes are more challenging to implement, they offer key advantages in terms of parallelism, speed and for realizing the essential vectormatrix products. Here, we demonstrated a network of up to 2025 nonlinear network nodes, where each node is a pixel of a Spatial Light Modulator (SLM). Recurrent and complex network connections are implemented using a Diffractive Optical Element (DOE), an intrinsically parallel and passive device [14]. Simulations based on the angular spectrum of plane waves show that the concept is scalable to well over 20.000 nodes. In a photonic RNN with N =900 nodes we implement learning using a digital micro-mirror device (DMD). The DMD is intrinsically parallel as well and, once weights have been trained, passive and energy efficient. Both, the coupling and learning concepts' bandwidth and power consumption is not impacted by the system's size, offering attractive scaling properties. Here we apply such a passive and parallel readout layer to an analogue hardware RNN, and introduce learning strategies improving performance of such systems. Using reinforcement learning we implement timeseries prediction with excellent performance. Our findings open the door to novel and versatile photonic Neural Network concepts. Figure 1(a) conceptually illustrates our RNN. Information enters the system via a single input node, from where it is injected into a recurrently connected network of nonlinear nodes. The computational result is provided at the single output node after summing the network's state according to weight matrix W DM D . Following the RC concept, one can choose the input and recurrent internal weights randomly [3]. Here, we create a complex and recurrently connected network using imaging which is spatially structured via a DOE, resulting in the internal connectivity matrix W DOE [14].

II. NONLINEAR NODES AND DIFFRACTIVE NETWORK
In Fig. 1(b) we schematically illustrate our experimental setup. The polarization of an illumination laser (Thorlabs LP660-SF20, λ=661.2 nm, I bias =89.69 mA, T =23 ℃) is adjusted to s-polarization and the polarizing beam splitter cube (PBS) therefore reflects all light towards the SLM. By focusing the illumination laser onto the first microscope objective's (MO1, Nikon CFI Plan Achro 10X) back focal plane, the SLM (Hamamatsu X13267-01) is illuminated by a plane wave. The λ 2 -plate in front of MO1 is adjusted such that the SLM operates in intensity modulation mode. Consequently the optical field transmitted through the PBS (p-polarization) for each SLM pixel i is modulated according to where E 0 i and x SLM i are the illumination and gray scale value of pixel i, respectively. κ SLM is the SLM's conversion factor between pixel gray scale and polarization rotation angle.
Ignoring for now the DOE's effect for explanatory purposes, the transmitted field is imaged (MO2, Nikon CFI Plan Achro 10X) on a mirror, and double passing through a λ 4 -plate results in a s-polarized field. The PBS therefore reflects the entire optical field, which is consecutively imaged (MO2, Nikon CFI Plan Fluor 4X) on the camera (CAM, Thorlabs DCC1545M). We rescale the camera image via linear interpolation to fit the number of pixels of the SLM. This step is necessary due to (i) an imaging magnification of 2.5, and (ii) the different pixel sizes of SLM (12.5 µm) and camera (5.2 µm). The detected state is Here, I sat is the camera's saturation intensity and N D the transmission through a neutral density filter (ND) always selected such that the the dynamical range of the camera is fully exploited, while avoiding over-exposure (maximum gray scale GS=255). After multiplication with scalar β, we add a constant phase offset θ i and send the resulting matrix back to the SLM. Ignoring the DOE's effect, each pixel therefore corresponds to an Ikeda map: We write the SLM state as vector x(n + 1), yet in the experiment this state corresponds to a square array of SLM pixels. Illumination wavelength, DOE (HOLOOR MS-443-650-Y-X), as well as MO1 and MO2 were chosen such that the spacing between diffractive orders matches the pixel-spacing of the SLM. Therefore, upon adding the DOE to the beam path, the optical field on the camera becomes where W DOE is the networks coupling matrix created by the DOE. In Fig.  1(c) we show the experimentally obtained W DOE for a network of 30×30 nodes. Upon inspection of the inset one can see that locally connectivity strengths vary significantly. This is due to each pixel illuminating a DOE area comparable to the DOE's lowest spatial frequency. As this area shifts slightly from pixel to pixel, the intensity distribution between diffractive orders varies. This intended effect inherently creates the heterogeneous photonic network topology needed for computation [3]. Finally, the photonic RNN's state x(n + 1) is given by Here u(n + 1) is the information to be injected into the RNN and γ is the signal injection strength. Matlab is used to control all instruments, update the network state and to inject the input information weighted by matrix γW inj i . The overall update rate of the entire system is ∼5 Hz. Currently, the maximum size of networks we can realize consist of ∼2500 nodes, which is limited by the imaging setup's field of view and not by the concept itself.

III. NETWORK READOUT WEIGHTS
Having created a recurrent photonic neural network driven by external data, the final to information processing is to adjust the system such that it performs the required computation. This is typically achieved by modifying connection weights according to some learning routine. Inspired by the RC concept, we constrain learning induced weight adjustment to the readout layer. Our 900 RNN nodes are spatially distributed, and we therefore can use a simple lens (Thorlabs AC254-400-B) to image the SLM's state onto a commercial array of micro mirrors (DLi4120 XGA, pitch 13.68 µm). Micro mirrors can be flipped between ±12°, such that for -12°the optical signal is directed to a detector. The detectors photo current then corresponds to the RNN output. With W DM D as the readout weight vector, the RNN output becomes y out (n + 1) = δ W DM D (1 − x(n + 1)) .
Here, δ relates the power recorded by the power meter (Thorlabs S150C and PM100A) to the SLM state. The signal directed towards the DMD is orthogonally polarized compared to the one directed to the camera, resulting in x DM D (n) = 1 − x(n). In the experiment weight vector W DM D corresponds to a square matrix, which can be seen in Fig. 1(b). The image labeled DMD shows a typical configuration of the DMD with W DM D . As the contribution of each node is either on or off, W DM D consists of Boolean entries only. Weights are not temporal modulations as in delay system implementations of RC [13], and therefore can be implemented by passive attenuations in reflection or transmission. Such passive processes are energy efficient and typically do not results in a bandwidth limitation. In this specific implementation, once trained, mirrors can simply remain in their position, and if mechanically fixed, would not consume additional energy. Also, readout eq. 4 is optically performed for all elements in parallel.

IV. PHOTONIC LEARNING
It is now the task of a learning algorithm to tailor W DM D such that signal y out (n + 1) approximates a target value as good as possible. In our experiment, we employ a version of reinforcement learning. The learning input signal is injected after inverting the weight assigned to one node. The error ε k of signal y out k (n + 1) obtained for configuration W DM D k is then compare to the error ε k−1 , where k is the index of learning iterations. If the error is reduced, we keep DMD configuration W DM D k , if not, we revert back to W DM D k−1 and invert a different weight. The weight to be updated is determined by the largest entrys W select,max k position l k according to rand(N ) creates a random vector with N entries, and at the start W DM D 1 and W bias are randomly initialized. W bias acts as a bias vector, whose values are increased by 1 N each learning iteration while the bias belonging to the most recently updated weight is set to zero. This results in a randomized selection process with a bias away from inverting recently updated weights. In simulations we found that reinforcement learning including such a bias showed significantly faster learning convergence. As a task to be performed by our system, we chose nonlinear time series prediction. The injected signal u(n + 1) is the chaotic Mackey-Glass (MG) sequence [3], and the RNN's learning target is y T (n + 1) = u(n + 2), the onetime-step-prediction of the MG system. Parameters of the temporal MG sequence where identical to [15], using an integration step size of 0.1. For determining the error ε k we discarded the first 30 data points due to their transient nature. The RNNs remaining output sequence was then inverted, its offset subtracted and normalized by its standard deviation, creating signalỹ out k . The error was measured by ε k = σ(y T −ỹ out k ), where σ is the standard deviation and ε k therefore corresponds to the normalized mean square error (NMSE).
At this stage we would like to stress a significant difference between neural networks emulated on digital electronic computers and our photonic hardware implementation. In our system, all connection weights are positive, and W DM D is boolean. This restricts the functional space available for approximating the targeted input-output transformation. As a result, first evaluations of the learning procedure and prediction of the MG series suffered from minor performance. However, we were able to mitigate this restriction by harnessing the non-monotonous slope of the cos 2 nonlinearity. We randomly divided the offest phases θ i | i=1...N , resulting in nodes with negative and positive slope of their response function. We chose Θ 0 = 42=0.17π and Θ 0 + ∆Θ = 106=0.43π, respectively, with a probability of 1 − µ for θ i = Θ 0 . As RNN-states and W DOE entries are exclusively positive, the nonlinear transformation of nodes with θ i = Θ 0 is predominantly along a positive slope, for θ i = Θ 0 + ∆Θ along a negative slope.  Fig. 2 (a). They reveal a strong impact of this symmetry breaking. Optimum performance for each µ is shown in 2(b). Best performance is found for an RNN operating around almost equally distributed operating points at µ = 0.45. This demonstrates that the absence of negative weights in W DM D , W DOE and x can be partially compensated for by incorporating nonlinear transformations with positive as well as negative slopes. This result is of high significance for optical neural networks, which, e.g. motivated by robustness considerations, renounce making use of the optical phase to implement negative weights.
We further optimized our system's performance by scanning the remaining parameters β and γ. In Fig. 3 (a) we show the error convergence under optimized global conditions for a training sample size of 500 steps (blue stars). The error efficiently reduces, and finally stabilizes at ε ≈ 0.013. Considering learning is limited to Boolean readout weights this is an excellent result. After training, the prediction performance is evaluated further on a sequence of 4500 datapoints consecutive data points which were not part of the training dataset. As indicated by the red line in the same panel, the testing error matches the training error. We can therefore conclude that our photonic RNN successfully generalized the underlying target system's properties. The excellent prediction performance can be appreciated in Fig. 3 (b). Data belonging to the left y-axis (blue line) shows the recorded output power, while on the right y-axis (red dots) we show the normalized prediction target signal. A difference between both is hardly visible, and the prediction error ε (yellow dashed line) is small. Down-sampling the injected signals by 3 creates condition identical to [15,16]. Under these conditions our error (ε = 0.042) is larger by a factor of 2.2 relative to a delay RC based on a semiconductor laser [15] and by 6.5 relative to a Mach-Zehnder modulator based setup [16]. These comparisons have to be evaluated in the light of the significantly increased level of hardware implementation in our current setup. In [15,16], readout weights were applied digitally in an off-line procedure using weights with double precision. In [16] a strong impact of digitization resolution on the computational performance was identified, suggesting that ε can be significantly reduced by increasing the resolution of W DM D .

V. CONCLUSION
We demonstrated a photonic RNN consisting of hundreds of photonic nonlinear nodes and the implementation of photonic reinforcement learning. Using a simple Boolean valued readout implemented with a DMD, we trained our system to predict the chaotic MG sequence. The resulting prediction error is very low despite of the Boolean readout weights.
In our work we demonstrate how symmetry breaking inside the RNN can compensate for exclusively positive intensities in our analogue neural networks systems. These results resolve a complication of general importance to neural networks implemented in analog hardware. Hardware-implemented networks and readout weights based on physical devices open the door to a new class of experiments, i.e. evaluating the robustness and efficiency of learning strategies in fully implemented analogue neural networks. The final step, a photonic realization of the input, should be straight forward, as it only requires a complex spatial distribution of the input information. Finally, our system is not limited to the reported slow opto-electronic system. Extremely fast all-optical systems can be realized employing the same concept since we intentionally implemented a 4f architecture to allow for self-coupling [14].