Multi-channel free space optical convolutions

. Convolutional layers are a critical feature of modern neural networks and require significant computational resources. In response, optical accelerators have been developed as a low-energy, high-bandwidth approach for performing large-scale convolutions. We extend these methods to act on many input channels each with their own set of convolutional kernels. We simulate the performance of this system with ray-tracing and evaluate its performance.


Introduction
Convolutional neural networks are widely used for a number of machine learning applications, including image classification, object detection, and natural language processing [1].At their core, convolutional layers use a moderate number of parameters to generate many feature representation of the inputs.These layers do so by convolving a number of input channels with sets of convolutional kernels and summing the resulting output (Figure 1).
In modern networks, both the number of channels and the size of the inputs used in convolutional neural networks are extremely large and require vast amounts of computational and energy resources, leading to efforts to develop alternative computing hardware.Researchers have realized that optical approaches are especially suitable for implementation of convolutional layers [2][3][4], as the properties of light are well suited for maintaining and manipulating spatial information.
Free-space optical approaches are particularly suited for large-scale convolutions with large inputs, as the Fourier transforming property of lenses simplifies optical convolution [2] and even lens-free convolution is possible [4].These approaches were developed, however, for a single input channel and multiple output channels, which is not suitable for deployment of modern networks.While combining signals from different channels in silico is an option, high data transfer and memory requirements constrain overall performance.
In this work we develop a free space optical convolution setup that uses multiple channels of inputs, each with their own set of convolutional kernels, and combines the outputs onto the same spatial locations.We simulate the system with a ray tracing approach and evaluate its energy efficiency.

Multi-channel convolutions
Multi-channel convolutions (Figure 1) are used in convolutional neural networks to process data from a set of n input channels I to a set of m output channels O, described in equation form as where K i j is the corresponding kernel for a given input channel i and output channel j.To perform this opera-tion optically, input channels are encoded as subarrays of light emitters of fixed size and spacing.Similarly, output channels are encoded onto sub-regions of a sensor array of fixed size and spacing.Prior work demonstrated how optical convolution may be performed by lens-free free-space propagation from a single light emitter array through kernels on an optical amplitude mask onto a sensor array [4].We developed a multi-kernel optical convolution approach (Figure 2) by including microlens arrays (MLA) between the light emitter array and the convolutional kernels.
Light from a single light emitter interacts with nearby microlenses, creating an array of light spots at exactly the positions of its convolutional kernels on the amplitude mask.All light emitters for a single subarray are spaced such that their corresponding light spots map exactly onto the same convolutional kernels.At the output plane, further propagation of light causes the light to separate, again forming the shifted positions of each output channel.Separate light subarrays are positioned so that different positions of the amplitude mask are used for each different set of convolutional kernels.

Simulation of optical convolutions
Monte-carlo ray tracing code was developed to simulate the free-space optical propagation of our setup.4 input channels of size 8 × 8 were each convolved with 4 kernels of size 3 × 3.No cropping or stride were used in this simulation, giving an output channel size of 10 × 10.Light from each emitter was simulated as photons with a emission distribution of and spread of an LED and propagated through components highlight in the schematic (Figure 2).
Results from the simulation are depicted in Figure 3. Input channels show a sparse, exponentially distributed input convolved with a set of random kernels.Optical raytracing produces a very similar result to ideal convolution values, with an overall Pearson's correlation 0.902.Overall, these results demonstrate the potential for optical approaches to perform exactly the types of convolutional operations required by modern machine learning approaches and improves approaches for optical coprocessors and accelerators.

Figure 1 .
Figure 1.For multi-channel convolutions, Convolutions are performed on each of n input channels (colors) each with m convolutional kernels.The convolved values are summed into m output channels.

Figure 2 .
Figure 2. Schematic of free-space multi-kernel optical convolution.Light propagates from an array through a microlens array (MLA) and a amplitude mask (kernels) before impinging on a sensor.

Figure 3 .
Figure 3. Simulated raytracing results of multi-channel optical convolutions.4 input channels are each convolved with 4 convolutional kernels.Results are compared to ideal digital values.