Three dimensional waveguide-interconnects for scalable integration of photonic neural networks

Photonic waveguides are prime candidates for integrated and parallel photonic interconnects. Such interconnects correspond to large-scale vector matrix products, which are at the heart of neural network computation. However, parallel interconnect circuits realized in two dimensions, for example by lithography, are strongly limited in size due to disadvantageous scaling. We use three dimensional (3D) printed photonic waveguides to overcome this limitation. 3D optical-couplers with fractal topology efficiently connect large numbers of input and output channels, and we show that the substrate's footprint area scales linearly. Going beyond simple couplers, we introduce functional circuits for discrete spatial filters identical to those used in deep convolutional neural networks.


I. INTRODUCTION
The interconnection of numerous input and output channels (IO-channels) is the basic operation behind many applications.A parallel and energy-efficient interconnect has therefore been a desired technology for decades [1,2], finding use in diverse fields such as telecommunication, inter and intra-chip data buses and potentially endoscopy [3].Most timely, it also is highly desired for connecting layers of deep neural networks to efficiently provide the typically large scale vector-matrix products [4].
The integration of such an apparatus is challenging.To achieve parallelism, serial routing is naturally not an option, and a large number of direct physical links connecting the IO-channels is required.Such channel multiplexing can be created in different dimensions like wavelength or space, and here we address spatial multiplexing.If a direct connection architecture is realized electronically, the strong capacitive interactions between long connection wires will result in prohibitive energy dissipation and bandwidth limitations [5,6].There are additional, more practical challenges.Lithographic fabrication typically integrates circuits in two dimensions (2D), and a 2D interconnect's footprint grows quadratic with the number of IO-channels.The cross-bar interconnect illustrates this fundamental relationship.
Optical routing removes the energy dissipation associated to charging the capacity of electronic signaling wires [5], and free-space interconnects with many IOchannels have long been explored [1,2].Integrated photonic interconnects, however, remain size-limited by the unfavourable scaling between area and the number of IO-channels in 2D [7][8][9][10].Crucially, the same scaling is found for wavelength division multiplexing.
We demonstrate the integration of such photonic interconnects in 3D for the first time.Complex 3D-routed waveguides are created by two photon polymerization [11,12].We introduce a fractal architecture which efficiently connects many IO-channels, and we demonstrate an integrated photonic interconnect of unreported size hosting 225 input and 529 output channels within a footprint area of only 0.46×0.46mm 2 .Crucially, this footprint area scales linearly.Such a printed photonic circuit can fully and in parallel connect the layers of deep neural networks of a commercially relevant size [4,13].Going beyond, we demonstrate a 3D-waveguide architecture implementing 9 spatial filters with a Haar convolution Kernel [14] of stride and width 3.Such convolutional filters represent a fundamental operation of deep convolutional neural networks [4].
3D photonic circuits promise integrated, parallel, scalable and hence large interconnects with potentially low energy dissipation.Our concept is based on mature fabrication technology which has also been exploited for photonic wirebonding between chips [15,16].

II. SCALING OF INTERCONNECTS
A strategy to overcome many of the bottlenecks currently experienced in neural network computation is to realize integrated circuits adhering to a neural network's complex topology [8,[17][18][19][20][21].As schematically illustrated in Fig. 1(a), a neural network is formed by linking large numbers of nonlinear neurons, which often are grouped in layers.It is particularly this intra-neuron interconnect which, despite recent progress [21], still eludes a fully parallel and scalable hardware integration.Most of today's integrated circuits are created via lithography, and are hence restricted mostly to 2D.In cross-bar interconnects, see Fig. 1(b), routing occurs via punctual contacts between two layers hosting input and output wires.The N I input and N O output ports are arranged along a column or row, and hence their number scales with N I |N O ∝ √ A for an area A. This is the general behavior in 2D.
Three dimensional, additive manufacturing has signifi- cantly matured and allows complex structures with nanometric feature sizes [11,[22][23][24].Crucially, the additional third dimension facilitates simple wiring topologies which are scalable, as schematically illustrated in Fig. 1(c).Input and output ports occupy a dedicated plane each (not rows or columns as in 2D), while the third dimension unlocks a circuit's volume for wiring: for each of the N I inputs, a dedicated plane hosts all connections to the N O outputs.Even in such a simple routing scenario the system's scaling of area N I |N O ∝ A and height H ∝ N I becomes linear.The strong impact of 2D versus 3D integration on the scalability of a parallel interconnect is schematically illustrated in Fig. 1(d).Interestingly, the 3D routing strategy has been confirmed by evolution: the most reduced topological property of the human neocortex leverages the same effect.Neurons are mostly located on its surface, while long range connections mostly traverse the volume.
However, 3D routing in electronics is challenging.Lithographic fabrication requires of the order N I signaling layers, which makes such fabrication prohibitive for the kind of dimensionality demanded by neural networks.Heat creation and heat dissipation from such a volumetric circuit's centre have additionally been identified as problematic [25].Disposing of this dissipated energy is a major bottleneck already for the mostly serial von Neumann processors [6], and parallel interconnects for NN's require significantly more such layers and connections.Photonics can overcome this challenge [5,26], which motivates the interest in photonic interconnects [1,2] and ultimately in photonic neural networks [8,[27][28][29][30][31][32][33] .

III. 3D INTERCONNECTS OF PHOTONIC WAVEGUIDES
Low loss 3D printed photonic waveguides have been demonstrated at telecommunication wavelengths [15,16,34,35].Our waveguides were fabricated using a commercial 3D Direct-Laser writing system from Nanoscribe GmbH (Germany).A negative tone photoresist "Ip-Dip" dropped on a glass substrate (25x25x0.7mm 3 ) was photopolymerized via two-photon absorption with a λ = 780 nm femtosecond pulsed laser, focused by a 63X, (1.4 NA) objective lens.After the writing process, the sample was immersed in a PGMEA (1-methoxy-2propanol acetate) solution for 20 minutes to remove the unexposed photoresist.Samples were written using the scanning mode based on a goniometric mirror, and the scanning speed on the sample's surface was kept constant at 10 mm/s.As optimization parameter we used the writing laser's power.The diameter of individual waveguides is d ≈ 1.2 µm, and they are spaced by D 0 = 20 µm.Samples were structurally inspected using a scanning electron microscope (SEM, FEI 450W).For optical characterization, we focused a 635 nm laser onto an input waveguide's top surface using a 50X microscope objective with NA = 0.8.The mode field diameter of the focused beam is 2 µm, hence larger than the input waveguide's diameter.The emission at the couplers' output ports was collected by a 10X, NA=0.30microscope objective and imaged onto a CMOS camera (iDS U3-3482LE, pixel size 2.2 µm) using a 100 mm achromatic lens, resulting in an optical magnification of 5.6.

A. Fractal topology for fully connected layers
Fully or densely connected layers are a principle topology in NNs [4,13].We adopt a routing strategy based on This translation invariance aids the development of strategies to avoid the intersection of waveguides before layer l = L, where they merge into their respective outputs.These details are illustrated for four neighbouring couplers with b = 9 and L = 1 in Fig. 2(b).We incorporated chirality into the fractal couplers: the b connections from a point in layer l to layer l + 1 have a negative curvature in the (x, y)-plane, which avoids intersections for vertical and horizontal connections.Furthermore, avoiding intersections for diagonal links additionally requires curvatures in the z-direction.ity successfully avoids unintended intersections.In Fig. 3(b) we show fractal trees for two bifurcations resulting in 1 × 81 coupling, with a circuit of N I = 9 inputs and N O = 121 outputs.As for the single bifurcation layer 3D coupler, the two bifurcation layer couplers are mechanically sound, even though they feature waveguide sections with an aspect ration exceeding 50.This excellent result motivated us to continue and integrate a full-scale interconnect with over 200 inputs, each of which are connected to 81 outputs, see Fig. 3(c).
Figure 4 depicts the optical transmission through a 1 × 9 fractal coupler.We used the camera images to characterize the optical losses and splitting ratios, where the injection spot focused onto the glass-substrate's top surface was used as reference.The average optical losses for 1 × 9 couplers are 5.5 dB, which rise to 10.6 dB for 1 × 81 couplers.Crucially, this includes optical injection losses I, propagation losses P and losses induced at the coupling or bifurcation points C. The fractal design principle allows us to determine each of these contributions.As previously discussed, angles of the different bifurcation layers remain constant due to fractal design.This results in identical bifurcation points for the entire topology, and hence we assume uniform coupling losses C. According to Fig. 4(a) some of the output ports' optical modes include second order Gauss-Laguerre contributions.As our polymer waveguides are freestanding in air, they have an exceptionally high diffractive index contrast of ∆n ≈ 0.5.According to the commonly employed approximation M = 0.5(πd∆n/λ) 2 for the number of modes M supported by a cylindrical waveguide, our waveguides support up to M ≈ 5 optical modes.However, early stage numerical simulations confirm that only the first and second optical mode are excited, which agrees with our experimental results.We would like to point out that the high refractive index contrast allows (i) single mode waveguides with a diameter of 0.3 µm only, (ii) exceptionally narrow bending radii, and the combination of (i) and (ii) facilitates compact integrated photonic circuits.
We analysed three 1 × 9 and three 1 × 81 couplers with respect to the relative power distribution at their output ports, and statistical information is given in Fig. 4(b,c).For the 1 × 9 couplers we find that (42 ± 4)% of the total optical output power is provided by the central waveguide, with the remaining ∼ 58% quite evenly distributed among the off-center ports, see Fig. 4(b).For 1 × 81 couplers, only (33 ± 6)% of the light is contained in the central waveguide, Fig. 4(c).Interestingly, the 1 × 81's ratio is not quite the square of the 1 × 9's ratio, indicating that cascading our bifurcating waveguides cannot be fully approximated simply by linearly multiplying the coupling ratios of the individual components.Higher order modes therefore appear to have an impact upon the splitting ratios.Overall, the asymmetric splitting ratio is most likely caused by the geometry, and in particular by the branching angles of our waveguide couplers.

B. Haar filters
The previously discussed, highly connected couplers, are typically required close to the output layer of deep neural networks.However, their first layers often highlight structural aspects of input information by tailored, local connection topologies.Examples are convolutional neural networks commonly employed in object recognition [4].Prominent convolution Kernels are so called Haar filters.These feature 2D Boolean entries, and this simplification creates a sparse representation of information contained in images, which is a crucial operation for neural networks to be able to generalize to unseen test data [14].We schematically illustrate in-and output properties of nine exemplary Haar filters (F1-F9) in Fig. 5. There, each filter Kernel's 3 × 3 Boolean weights (0: dark, 1:light) are illustrated as input, while each filter's dedicated output port is indicated as the output.
We developed a 3D routing topology, schematically illustrated on the right in Fig. 5, to realize the 9 Haar filters.Even in 3D this is challenging, which can be appreciated from the intricate network of connections.Furthermore, the number of configurations scales factorial with the number of filters, and for the required 37 connections of the 9 filters there exist 362880 possibilities.This already large numbers still ignores all geometrical aspects such as waveguide curvatures along the different dimensions.In order to better illustrate the operating principle, we have highlighted the connection topology of filter F2 in orange.For each filter, the input ports weighted by 1 are directly wired the the filter's output.For incoherent injection into the 3 × 3 input waveguides, the intensity at the filter's output should therefore be proportional to the overlap between its Boolean weights and the input.
In Fig. 6(a) we show the SEM picture of the 3D printed spatial filtering interconnect realizing 9 Haar fil-ters.Waveguides feature smooth surfaces and the overall structure is stable.However, one can identify a tendency that output waveguides with few connections start leaning outwards.Figure 6(b) shows a densely multiplexed array of Haar filter units.Such an interconnect would implement the convolution of a 21 × 21-pixel input image simultaneously with filters F1-F9 fully in parallel.As the individual filter units do not overlap in space the implemented convolution a convolution stride 3.
Figure 6(c) depicts the optical characterization of the filters' connectivity using the same procedure as for the fractal optical couplers.The individual sub-panels correspond to the transmission through a different filter (F1 to F9) when injecting light into the output port.The optical characterization was therefore carried out in backward direction.We opted for this procedure since output intensities of individual filters correspond to the filter's Kernel only in the backward direction.In forward direction one would have to iteratively inject into the individual input ports and then sum the output intensities of the different injections; which is possible in principle yet less systematic.Generally, we find an excellent agreement between the designed filter Kernels and the intensities in the reverse propagation direction.The different loss mechanisms obtained for the fractal couplers are consistently reproduced for the Haar filters, with the peculiarity that each coupler exhibits distinct coupling losses C. This, however, is to be expected; different filters rely on specific connection degrees, topologies as well as different branching angles.
There is some cross-talk from the optically injected port onto the image of the output plane.One cause might be the smaller height of the overall 3D circuitry.Light not collected and guided by the injected waveguide therefore illuminates a smaller area on the circuit's output plane, which in turn results in a higher intensity when imaged onto the camera.The outwards-leaning input connections, see Fig. 6(a), might additionally contribute.The resulting non-orthogonal illumination of the waveguide's tip will most likely reduce the injection efficiency and therefore increase the cross-talk of uncollected light to the output plane.For a fully integrated system this cross-talk will potentially be reduced significantly.Inputs will in most cases be provided by optical fibers or waveguides arriving from an earlier stage of the optical system, for example when using a fiber bundle for collecting an input image.This will also be true for the filter's output, which will be connected to some fiber or waveguide for further processing down stream.

IV. DISCUSSION AND CONCLUSION
We successfully demonstrated complex and large scale 3D photonic interconnects.Waveguides with a diameter of ≈ 1.2 µm were created by direct laser writing based on two photon polymerization.Using this novel integration strategy we demonstrated intricate 3D routing topologies for large scale, highly connected as well as convoluting optical interconnects.These example architectures were mostly oriented towards application in neural networks, where such interconnects can realize the large scale vector matrix products fully in parallel, with picosecond latency and potentially low energetic cost [10].It is the first time that such complex and large scale integrated optical interconnects have been created in 3D.
As our concept scales linear in size it allows for novel routing topologies, which in turn will create new opportunities for integrated special purpose neural network chips.Here, either complete implementations of neural networks, or the use of the photonic interconnects purely as neural network accelerators are a possibility [10].However, there is a wider relevance for computing.The end of Moore's and in particular Dennart's scaling is arguably induced by energy penalties of a processor's electronic signaling wires.Photonic routing could prolong the scaling of classical electronic (or now: opto-electronic) von Neumann processors, and these ideas can be expanded to intra or inter chip connections.Finally, non-computing related applications such as miniature remote sensing are further possibilities.Ultimately we have demonstrated the first large scale 3D printed photonic circuit board.
The here reported findings are based on the first demonstrations of several, complex 3D photonic circuits, and performance as well as topologies offer significant potential for further improvements.Beyond losses, it is in particular the asymmetric splitting ratios who deserve further attention, even though such imbalance can, by a certain degree, be compensated for by using phasetunable topologies [7].Couplers with an even splitting ratio (such as 1x4) promise potentially better homogeneity.
Most importantly, we have addressed the nonscalability of parallel and integrated interconnects for the first time.In order to fully benefit from this new substrate, its functionalization is essential.External control over a waveguide section's phase delay would enable unitary optical transformations on a scalable substrate [7].An extension by active or nonlinear photonic elements will establish a new type of photonic device.Crucially, small scale low bandwidth 3D printed polymer circuits are actively considered in electronics, for example for wareables [36].tion programme under the Marie Sklodowska-Curie grant agreement No. 713694 (MULTIPLY).

FIG. 1 .
FIG. 1.(a) Topology of a deep neural network.Links between layers of neurons correspond to large scale interconnects.(b) Crossbar arrays link in and output channels (IO-channels, black dots) in parallel in 2D; IO-channels are arranged along a line.(c) In three dimensions, IO-channels can be arranged in an array, while connections are implemented in the third dimensions.(d) The number of IO-channels of a parallel interconnect scales linearly with size in 3D.In 2D scalability is significantly worse.

FIG. 2 .
FIG. 2. (a) Design principle of an optical coupler with a fractal geometry.Numerous layers of branching connections can be cascaded, and distances from one layer to the next scale with √ b, where b is the branching ratio.(b) 3D illustration of a small network hosting simple couplers.Chirality of the connections avoids the intersection of individual waveguides between the input and output ports.

Figure 3 (
a) shows an SEM image of a 3D fractal coupler array hosting N I = 81 input and N O = 121 outputs, each with L = 1 and b = 9.We can see that chiral-

FIG. 4 .
FIG. 4. (a) Optical transmission through a single bifurcation layer 1×9 coupler, with intensity color-coded on a logarithmic scale.Histograms of the relative output intensity distribution for the 1×9 (b) and 1×81 (c) coupler.Statistical information obtained from three couplers, each.

FIG. 5 .
FIG. 5. Schematic illustration of the input-output mapping of 9 Haar filters (F1-F9) with Kernel width and stride 3. A 3Dprinted waveguide architecture realizing all 9 filters in parallel is shown on the left, with filter F2 highlighted in orange.The highlighted sub-stricture implements

FIG. 6 .
FIG. 6.(a) SEM micrograph (10 kV, 40 • ) of a single Haarfilter.(b) Full micrograph (5 kV, 0 • ) of a large array hosting spatial-filtering for connecting layers of a convolutional neural network.(c) Optical characterization of the filter's connection topology, injection at the output port and recording the input ports emission.