8x8 Reconfigurable Quantum Photonic Processor based on Silicon Nitride Waveguides

We demonstrate a reconfigurable 8x8 integrated linear optical network based on silicon nitride waveguides for quantum information processing. Our processor implements a novel optical architecture enabling any arbitrary linear transformation and constitutes the largest programmable circuit reported so far on this platform. We validate a variety of photonic quantum information processing primitives, in the form of Hong-Ou-Mandel interference, bosonic coalescence/anti-coalescence and high-dimensional single-photon quantum gates exploiting the entire functional area of the processor. We achieve fidelities that clearly demonstrate the promising future for large-scale photonic quantum information processing using low-loss silicon nitride.

In fact, doped silica and silicon-on-insulator technologies host, so far, the majority of on-chip universal linear optical networks for QIP 33 . While the doped silica platform offers low propagation loss only with a larger footprint, i.e., low component density, silicon-on-insulator allows for dense optical circuits at the expense of higher loss. As large scale photonic circuits for QIP require both low propagation losses and high component density, a platform that combines these properties is very beneficial for the development of QIP. This combination is what silicon nitride offers.
Here, we demonstrate, for the first time, an 88-mode linear optical network implemented in a photonic processor based on silicon nitride waveguides. The photonic processor is fully reconfigurable over its entire functional area and contains the highest density of components per targeted loss to date 1,10,34 (see Supplement). The 88-mode photonic processor, the largest realized so far in silicon nitride, includes 128 reconfigurable elements: 64 tunable beam splitters, constructed as Mach-Zehnder interferometers with internal thermo-optic phase shifters, and 64 additional phase shifters, arranged in a novel linear optical architecture. The processor contains the optical realization of a Blass matrix 35,36 , a well-known architecture for beamforming networks in microwave engineering, where it is used for directional transmission of radio frequency signals to and from antenna arrays. Translating this architecture from microwave engineering to optical QIP, the Blass matrix supports the realization of any arbitrary linear transformation, both unitary 37,38 and non-unitary [39][40][41] . We show that our processor preserves the coherence of quantum states by programming the processor to implement quantum interference. As a proof-of-principle demonstration of the architecture's capability to implement nonunitary transformations, we show anti-coalescence of bosons 42-44 on a 2×2 Blass matrix. Finally, we realize high-dimensional single-photon quantum gates exploiting the whole spatial mode structure of the processor. Figure 1 Artist's impression of a silicon nitride waveguide programmable linear optical network. The high index contrast enables a dense waveguide arrangement with the unique combination of ultra-low straight-propagation loss. The wide spectral transparency range makes silicon nitride suitable for quantum light sources from the visible to the mid-infrared. Figure 2 shows a schematic of the experimental setup. The photonic processor ( Fig. 2(a)) consists of 64 unit cells, each composed of a phase shifter (in red) and a tunable beam splitter (in blue), in an arrangement that enables any arbitrary 88 transformation. Each tunable beam splitter is constructed as a Mach-Zehnder interferometer, based on two directional couplers and an internal thermo-optic phase shifter. The 128 thermally tunable elements are remotely controllable and are designed to each allow a phase shift. The tuning range can be extended with longer tunable elements or stronger current supplies. The photonic processor is based on stoichiometric silicon nitride waveguides, grown with low-pressure chemical vapor deposition, with a double-stripe cross-section 25 . The waveguides exhibit a propagation loss of 0.2 dB/cm for a total on-chip transmission greater than or equal to 60%, a value that corresponds to the longest optical path. The coupling losses to a single-mode optical fiber are about 2.9 dB/facet which can be greatly reduced (to 0.5 dB/facet 25 ) by waveguide tapering.

Experimental setup
Single photons for the experiments are provided with two parametric down-conversion sources ( Fig.  2(b)). Frequency-doubled light from a mode-locked fiber laser, with a center wavelength of 775 nm and a spectral width of 2 nm, is divided into two paths, one containing an adjustable delay line, and focused into two 10-mm-long periodically-poled KTiOPO 4 (PPKTP) waveguides 45 . Each PPKTP waveguide generates, via type-II down-conversion, orthogonally-polarized spectrally-separable photon pairs at telecom wavelengths (signal and idler at 1547 nm and 1553 nm, respectively). After removal of the pump wavelength (F filter for pump rejection) the signal and idler photons are separated using a polarizing beam splitter (PBS) and collected by single-mode fibers. The signal photons are used to herald the idler photons with a heralding efficiency of 30%. The two idler photons are coupled into the photonic processor using polarization-maintaining fibers. For photodetection, we use a set of fiber-coupled superconducting singlephoton detectors (efficiency 85% 46 ). The photonic processor is composed of 64 unit cells, each comprising a phase shifter (red vertical line) and a tunable beam splitter (blue horizontal line) implemented as a Mach-Zehnder interferometer. The rhomboidal shape of the processor's schematic has been chosen for better overview. In the real processor the elements are arranged on a square mesh. (b) Photon pairs are generated via type-II parametric down-conversion in PPKTP waveguides pumped with a mode-locked laser at 775 nm and injected into the photonic processor.

On-chip quantum interference
To demonstrate the suitability of the photonic processor for QIP, we first observe Hong-Ou-Mandel (HOM) interference 47 between two photons (bullets of the same color in Fig. 3(a)) at various positions (beam splitters) within the processor (colored disks in Fig. 3(a)). The processor is configured to route the two incident photons across the chip to a targeted beam splitter, which is programmed to a reflectivity of 50%, after which the photons are directed to two outputs. The coincidence count rates at the outputs versus the relative delay of the two single photons are recorded. In fig. 3(b) we show, as an example, the HOM interference at one of the targeted tunable beam splitters (red curve), at the center of the processor in comparison with a reference measurement, i.e., an off-chip HOM experiment using a fiber beam splitter (blue curve). It can be seen that the two measurements are well in accord. At a mean photon number of 0.01 we measure a reference HOM dip visibility of 81% between the idler photons of the two sources, with the visibility defined as , where and are respectively the coincidence counts for temporally distinguishable and indistinguishable photons. We repeat the experiment at various positions, i.e., beam splitters, within the photonic processor obtaining similar results, i.e., an average visibility of 76%. The consistency of the measured on-chip HOM dips, over the whole processor depth, with the reference shows that our photonic processor preserves the spectro-temporal similarity of the photons and confirms the suitability of the photonic processor for quantum information processing.

Arbitrary linear transformations
Due to its architecture, the photonic processor can be configured to perform arbitrary linear transformations on its 8 modes, both unitary and non-unitary 39-41 , the latter implemented via ancillary modes. In QIP, non-unitary, lossy, transformations are typically considered detrimental. However, the additional freedom obtained by removing the restriction of unitarity allows for new transformations that exhibit exciting behavior such as a tunable quantum interference and an apparent nonlinear absorption 43,48,49 . Already the simple case of a balanced symmetric lossy beam splitter contains free parameters determining the relative phase of the transmission coefficients that enable the tuning of the well-known HOM-like dip, the signature of bosonic coalescence, into a HOM-peak for bosonic anti-coalescence.
To illustrate how the Blass matrix architecture allows the implementation of non-unitary transformations, we realize a balanced symmetric lossy beam splitter, involving four beam splitters and two phase shifters (see Fig. 3(c)), described by the matrix 1 1 1 i , with the phase a free parameter. The behavior observed in a quantum interference experiment between two single photons will oscillate between coalescence and anti-coalescence of the photons depending on this phase 49 , with the wellknown HOM-like coalescence for . The photonic processor is programmed to perform such a nonunitary 2×2 transformation on a 2×2 Blass matrix. Fig. 3(d) shows the quantum interference between two single photons for two different non-unitary 2×2 transformations implemented on the chip, resulting in bosonic coalescence for phase (red) and anti-coalescence for 0.52 rad (blue). The visibility of these HOM-like dip and peak are 81% and 70%, respectively, as expected for these specific transformations. Figure 3 (a) Two-photon interference at various locations of the processor (colored circles), also indicating the used pairs of input waveguides (bullets of the same color). (b) Coincidence probability versus delay. The two-photon interference measured at the tunable beam splitters on the processor (red data points) is well in accord with the off-chip reference measurement (blue data points). All the investigated beam splitters show a similar visibility. The solid curves indicates Gaussian fits to the data. The error bars are given by the square root of the number of coincidences. (c) Implementation of a lossy beam splitter on a 2×2 Blass matrix. The black elements are set accordingly to the desired T. (d) Coincidence versus delay for two different lossy transformations T. Measured two-photon bosonic coalescence (red curve)/anti-coalescence (blue curve) with a visibility of 81% and 70% respectively. (e) Realization of an 8-dimensional X-gate and (f) its measured truth table. (g) Truth tables of integer powers of the X-gate reported above. The average fidelity is 94.6%. (h) Evolution of a coherent superposition input state | 1 √2 ⁄ |1 |1 through a 6-dimensional X-gate, giving a fidelity of 91.9%.

High-dimensional quantum logic gates
High-dimensional quantum states, i.e., qudits, are of importance for large-alphabet quantum communication protocols 50 and cryptography 51 . In optics, qudits can be implemented using a modal degree of freedom, spatial or temporal, of the single photon to encode information. When encoding in the spatial degrees of freedom, large unitary linear optical networks can be exploited to implement high-dimensional quantum logic gates for the control and manipulation of such qudits 52,53 . As shown in 52 , providing all the integer powers of a d-dimensional X-gate, i.e., X, X … X , and of a d-dimensional Z-gate, enables any unitary operation in a d-dimensional state space, with d=8 in our case, where the action of a d-dimensional X-gate can be described as X |j |j 1 mod d . Here we demonstrate the realization of an 8-dimensional X-gate (see Fig. 3E) and all its integer powers, i.e., X, X … X , in an 8-dimensional-rail encoding thus exploiting the whole mode structure of the processor. Figure 3(f) shows the measured truth table for the X-gate, obtained by injecting single photons into each of the 8 inputs. The results for the integer powers of the X-gate are summarized in Fig. 3(g). The fidelities of these gates are about =94.6%, where the fidelity of each gate is calculated as the average state fidelity ∑ • , with p and p being the theoretical and experimental probabilities for each computational input , respectively. Finally, we measure the transformation of a single photon in the coherent superposition state √ |1 |1 through a 6-dimensional X-gate, with a measured gate fidelity of =96.2%. Figure 3(h) shows the action of the 6-dimensional X-gate on the coherent superposition input state showing that the gate preserves the relative phase of the state.

Conclusions
We report the realization of a fully programmable and remotely controllable 88-mode photonic processor, which is the largest universal linear optical network realized on Si 3 N 4 . We have demonstrated a variety of QIP primitives such as on-chip HOM interference, bosonic anti-coalescence on a 2×2 Blass matrix and high-dimensional single-photon quantum gates. The obtained results show that our processor retains the indistinguishability of the photons, limited only by the off-chip single-photon source, and enables any arbitrary linear transformation. Our findings demonstrate the promising future of the Si 3 N 4 platform for the development of large reconfigurable universal linear optical quantum circuits.

S1. Functional complexity
In the variety of on-chip universal linear optical networks presented so far, two conflicting developments can be recognized. To achieve higher degrees of functionality, i.e., higher functional complexity, an increasing number of functional elements, e.g., tunable or switchable, is required. Since the size of integrated optical chips is constrained to a chip or wafer by fabrication technology, the density of components on photonic chips can ultimately only be increased via a reducing the size of components and making sharper bends in the waveguides. On the other hand, it remains a central requirement to maintain the lowest propagation loss also with a growing number of components and therefore growing optical path lengths, particularly for quantum processing schemes. Since tunability, component size and optical loss are intrinsically coupled properties of the optical materials and fabrication technology used, the future of photonic processors depends critically on which material platform will enable the greatest functional complexity for a given level of acceptable loss. An impressive variety of quantum photonic processors has been demonstrated with different material platforms. A most prominent representative for semiconductor materials is silicon-on-insulator (SOI) as employed for demonstration of, e.g., bosonic transport simulation 1 . The advantage of the SOI platform is that it supports extremely dense photonic circuits, thanks to its high index contrast between the waveguide core and cladding (∆n 40% 2 ), because this allows for small feature size via tight bending radii without much radiation loss. Also, due to the relatively small bandgap, highly responsive tuning elements, with typically π phase shift within less than 100 µm propagation length, can be realized using carrier injection 3 . On the other hand, surface scattering in combination with crystallinity and the high index contrast enhances the optical propagation loss in SOI to levels of several dB/cm, both in straight and curved waveguide sections. Additional loss occurs as the drawback of high responsivity in tuning, due to freecarrier absorption 4 .
In contrast to semiconductors, amorphous dielectric materials with large bandgap are known for lowest propagation loss, the most well-known representative being doped silica. This platform has been used, for instance, for demonstrating universal linear optical circuits 5 . The typical propagation loss is at least an order of magnitude lower than in SOI, at the level of 0.1 dB/cm 2 , and thermal tuning can be applied without inducing noticeable additional loss. However, the disadvantage is the low index contrast which is inherent to waveguide cores based on doping (∆n 0.5% 2 ). The low contrast leads to weak guiding and severe radiation loss occurs at smaller curvatures, which makes dense and thus complex waveguide circuits unfeasible.
The work we present here makes use of an advanced dielectric waveguide platform involving waveguide cores made from stoichiometric silicon nitride, embedded in a cladding made from stoichiometric silicon oxide. Based on slow deposition at high temperatures (low-pressure chemical vapor deposition), these waveguides offer a unique combination of high index contrast (∆n~18% 2 ) and ultralow propagation loss (> 0.0004 dB/cm 6 ). The platform thus offers ideal preconditions for the realization of dense and low-loss photonics circuits, where tuning-induced loss can be neglected.
In the following, we inspect the different degrees of complexity achievable with the named photonic platforms. For a quantitative comparison, we estimate the maximum achievable complexity with SOI, doped silica and Si N in terms of a new figure of merit, C . We define this figure as the maximum number of functional unit cells, n , that can be arranged in a 2D square mesh, before the intensity of the light has dropped to a fraction f. For convenience and definiteness, we proceed with a specific value for this fraction, f e ; however, any other value can be selected as desired.
Next, for comparing universal photonic processors, we define a prototype waveguide circuit as unit cell, with many unit cells forming the processor. As has been shown, all unitary transformations can be implemented using a network of unit cells that can perform two essential functions, tunable beam splitting followed by tunable phase shifting 7 .
The most basic functional design of such a unit cell in the form of a 2D waveguide circuit is displayed in Fig. S1. We consider the beam splitter realized with two directional couplers in the form of a Mach-Zehnder interferometer (MZI) where the splitting ratio can be tuned via a phase shifter in one of the interferometer arms. An additional phase shifter in one of the output waveguides allows tuning of the relative phase between the two outputs of the MZI. The optical loss caused by a single unit cell depends on some basic geometrical parameters, specifically, the propagation length through straight waveguide sections where phase tuning is provided, L , and the propagation length through sections that are bent with a certain radius R, L 4πR. The optical loss associated with the various waveguide sections depend on the chosen waveguide platform as described above and on the chosen core cross section of the waveguides. The latter dependence means that the intrinsically lowest-loss of a specified circuit fabricated with a given material platform can only be approached by a variation of the core cross section along propagation. Specifically, for each waveguide section depending on its curvature, a different cross section has to be chosen that minimizes the propagation loss. In section S5 we describe how we have obtained for each of the three platforms an empiric relation between the waveguide bending radius and the minimum achievable power loss constant, α R , specified in unit of dB/m. Having defined a waveguide circuitry for the unit cell of a photonic processor in terms of waveguide lengths and curvatures then allows to estimate the number of unit cells to be passed before the specified loss fraction, f, is reached.
When analyzing the generic unit cell shown in Fig. S1, its path length, over which light propagates, is L L 2L . Here we have assumed for simplicity that the circuit makes use of 90º-bends and that the length of the directional couplers can be neglected compared to the length of tunable and other straight sections.
The number of unit cells, n, that can be coupled in series until the power transmission is reduced to e is found by solving the following equation T exp n • α • L • ln 10 /10 e for n. In this expression α specifies the power loss for the sections with straight-propagation, with phase-tuning and for bent waveguide sections, respectively α , α and α , and where L , L , and L are the according lengths of the sections. The total propagation losses are thus given by α • L α • L α • L α • 2L . The number of unit cells that can be arranged in a 2D square mesh obeying the same loss condition, i.e., the figure of merit for maximum functional complexity, becomes C n 10/ α • L • ln 10 . Figure S1 Schematic of a universal linear optical network. The unit cell (dashed frame) comprises a tunable beam splitter in the form of a tunable Mach-Zehnder interferometer with two directional couplers and a phase shifter in one arm of length L (red), followed by an external phase shifter (L , red). The path length through the unit cell is determined by the length of the straight-waveguide tunable elements and the length of curved waveguides (radius of curvature R).
In Fig. S2 we plot the functional complexity for a 2D mesh versus the bending radius, calculated for different material platforms (different colors). The values used as loss coefficients, α , α and α and the length of the tunable element are summarized in Table S1 (the coefficients summarize previous experimental data as described in section S5).
Table S1 It can be seen that silicon nitride provides a functional complexity that is almost four orders of magnitude higher than that of SOI and 1.5 to 3 orders higher than that of silica, depending on the bending radii considered. The much higher propagation loss of SOI is mainly introduced by the tuning via carrier injection 3 , even if L can be held much shorter than in Si N . With regard to doped silica, the higher functional complexity of silicon nitride is due to its substantially lower bending loss. In order to identify and quantify possible room for improvements we compare three recently published realizations, i.e., 1,5,12 and this work, all of them describing on-chip linear optical networks (see data points in Fig. S2). The highest complexity of current SOI processors is close to the maximum possible and that of silica can be improved by up to two orders of magnitude with smaller radii of curvature. Fig. S2 Functional complexity calculated for three different platforms: silicon nitride (red curve), SOI (blue) and doped silica (green). The highest functional complexity achieved in previous work is indicated as data points, i.e., [1][12] realized in SOI, [5] in doped silica and this work realized with silicon nitride.
Table. S1 Parameters used for the functional complexity calculation.
*The dependence of bending loss vs. waveguide curvature radius is obtained as described in the S5.
improvement is expected for Si N . Although the complexity of Si N is leading already by one order of magnitude, more than three orders of magnitude seem possible.

S2. Photonic processor
To ensure propagation losses of 0.2 dB/cm, a small footprint and single-mode propagation at telecom Cband wavelengths (around 1550 nm) a double-stripe waveguide cross section is chosen 13 . The total on-chip propagation losses are less than 40%, a value that corresponds to the longest-possible on-chip geometrical path length (about 10 cm, along 15 tunable beam splitters). An array of polarization-maintaining fibers is bonded to the chip to give optical access to the waveguides. The processor is temperature-stabilized by a Peltier element and independent thermal tuning of the 128 phase shifters is accomplished via USBcontrolled drivers. The entire assembly comprising the chip, fiber arrays and electronics, is packaged into a single portable case with a USB connection and power socket at its back, and with 16 FC/PC fiber connectors at the front panel. After transportation of the box from Twente to Oxford, there was no need for recalibrating the tuning elements on chip, showing that the assembly is robust against vibrations and insensitive to fluctuations in the environment temperature. After plugging in the single-photon source, the experiments were carried out straightforwardly via computer control of the USB input.

S3. Characterization of tunable elements
The calibration of the tunable elements requires to measure the phase shift induced by the applied heating voltage, U. This voltage dependence follows a square law, ϕ U c d • U , which was measured for each phase shifter by reading interference fringes in the bar mode photon count value of the respective MZI (see inset Fig. S3 for bar and cross definition). The theoretically expected response at the bar mode is a sinusoidal function, f a b • cos ϕ U . The calibration parameters a, b, c and d are real numbers and are determined by a least-squares fit. Each tunable element is calibrated independently, as described in Supplement of 5 , starting, in our case, from the tunable beam splitter at the bottom corner of the processor. Figure S3 shows, as an example, the transmission at the bar mode of the first accessible beam splitter (bottom corner in Fig. 2) versus the square of the applied heating voltage, as compared to the theoretically expected response f, which is fit to the data. The residual deviations of the data from the sinusoidal curve indicates that the beam splitter can be tuned very precisely to any desired phase value, within its tuning Figure S3 Transmission of the first accessible tunable beam splitter. The diamonds are the experimental data and the blue dashed line is the sinusoidal fit. The visibility of the sinusoid gives the range of splitting ratio achievable and the period of the sinusoid is related to the phase shift induced by the tunable element. range. As can be seen, a full period of sinusoidal curve is not achievable. This limitation can be removed both by making longer heaters and by allowing for higher currents.
All of the 128 tunable elements show similar values for the splitting ratio and phase shift to the one reported above in the text. Figure S4 shows the schematic of how to implement any integer power of a d-dimensional Pauli X-gate with a d-dimensional linear optical network in Reck's scheme 7 using d-dimensional rail encoding. The linear optical network is described by the schematic in Fig. S4 where the element , indicates the tunable beam splitter at the row and column of the linear optical network. The power of a d-dimensional X-gate, i.e., , can be found by setting the reflectivity of all the beam splitters , to 1 (bar mode) and

S4. High-dimensional single-photon quantum gates
changing to 0 (cross mode) the reflectivity of the elements up to the column and up to the 1 row, for 1, …, n.

S5. Derivation of loss coefficients
We recall that radiation loss occurs at all waveguide curvatures (bending loss). When keeping the waveguide cross section constant along the propagation coordinate, the bending loss coefficient, α R , increases exponentially with the inverse radius of curvature, R 14 . Using short bending radii reduces the propagation length through bent waveguides, so the bending loss, reducing also the area occupied by a circuit. For bent waveguides with increasing radius of curvature, the loss levels off to the minimum value, α , given by the straight-propagation loss, which is usually given by material-intrinsic absorption and Rayleigh scattering. Both the bending loss and straight-propagation loss are strongly dependent on the index contrast between core and cladding as defined by the selected material platform, by the chosen shape and size of the waveguide core. They depend also largely on the chosen fabrication process. Generally, tightly confining the light to the core with high-contrast waveguides and large, wavelength-sized cross sections, reduces the bending loss but simultaneously increases the straight-propagation loss. Weak guiding on the other hand, as achieved with low index contrast or a small (sub-wavelength) core size in high-contrast materials, can yield very low straightpropagation loss; however, the bending loss becomes significant. In conclusion, the overall loss in a circuit is minimized if the waveguide cross section is continuously adjusted to the local curvature radius. This approach can be seen, e.g., in recent work with silicon waveguides 15,16 . In order to analyze which waveguide platform offers highest component density at a targeted loss we derive an empirical expression for the minimum value of α R vs the bending radius. Figure S5 displays experimentally determined loss constants as reported for SOI, Si N and doped SiO (silica) waveguides vs the fabricated bending radius. Data published until 2014 are taken from 17 and more recent data from 15,16,18 and 1 . Although the loss values in the named references show a significant variation, selecting only data points with the lowest loss for each radius and platform yields clear trends. Specifically, the minimum reported loss data vs waveguide bending radius shows that the lowest-loss values follow approximately an inverse power law. We note that this dependence was found approximately also in numerical calculations when adjusting the width and height of rectangular waveguides from silicon, Si N , and GeO -doped silica waveguides 17 . Fitting inverse power laws to the lowest-loss experimental data is thus consistent with the assumption that each lowest-loss experimental observation was based on choosing an optimum waveguide cross section for the corresponding bending radius. The coefficients extracted for each platform from the power-law fits are listed in table S1.
As reported above, with increasing radius of curvature, the value of α R levels off to a constant offset value, which is the minimum loss for straight waveguides α . To indicate these levels, we have drawn in Fig.  A1 horizontal dashed and solid lines that pass through the lowest reported experimental loss values (data points in the dashed frame taken from 6 for Si N , from 8 for SOI and from 10 for doped silica). Adding the according loss constants, α , to the inverse power law functions then yields a closed expression for the expected minimum loss vs bending radius (solid curves in Fig.A1).
Eventually, for a complete description of losses in tunable and programmable photonic processors, also the loss in the phase-tunable sections of a waveguide circuit has to be quantified, α (which we term tunabilityinduced loss). For thermo-optic tuning using large-bandgap dielectric materials, here doped silica and Si N , no additional losses are expected or have been reported. To implement highly effective tuning in semiconductors within short propagation lengths, carrier injection can be applied, however, this increases the propagation loss through free-carrier absorption 3 . For the waveguide cross sections to be used we assume standard values, i.e., 220 nm thickness for SOI waveguides 8 , and weakly guiding high-aspect ratio waveguides for Si N 6 and doped silica 19 . Figure A1 Overview of planar waveguide propagation loss versus bending radius as in [17] with more recent works [1,15,16,18]. The loss value for large radius tends to α .