Hardware error correction for programmable photonics

Programmable photonic circuits of reconfigurable interferometers can be used to implement arbitrary operations on optical modes, facilitating a flexible platform for accelerating tasks in quantum simulation, signal processing, and artificial intelligence. A major obstacle to scaling up these systems is static fabrication error, where small component errors within each device accrue to produce significant errors within the circuit computation. Mitigating this error usually requires numerical optimization dependent on real-time feedback from the circuit, which can greatly limit the scalability of the hardware. Here we present a deterministic approach to correcting circuit errors by locally correcting hardware errors within individual optical gates. We apply our approach to simulations of large scale optical neural networks and infinite impulse response filters implemented in programmable photonics, finding that they remain resilient to component error well beyond modern day process tolerances. Our results highlight a new avenue for scaling up programmable photonics to hundreds of modes within current day fabrication processes.


INTRODUCTION
Integrated photonics is a key technology for optical communications and is advancing rapidly for applications in sensing, metrology, signal processing, and computation. Programmable photonic circuits of optical interferometers, which can implement arbitrary filters and passively compute matrix operations on optical modes, are the optical analogue to field programmable gate arrays (FPGAs) and enable photonic circuits to be flexibly reconfigured post-fabrication by software [1,2]. Experimental demonstrations of these circuits have already shown working systems operating on up to tens of optical modes, which have been used to accelerate tasks in quantum simulation [3][4][5][6][7], mode unscrambling [8][9][10][11][12], signal processing [13,14], combinatorial optimization [15], and artificial intelligence [16].
While scaling up these systems to hundreds or thousands of modes would be immensely beneficial, doing so will require precise fabrication of tens of thousands of optical interferometers. Unfortunately, static component errors induced by process variation introduce errors that rapidly accrue for larger systems, limiting their usefulness for many applications. This is because the decomposition [17,18] and optimization techniques used to program these circuits assume that all of the components are ideal; thus, any component errors result in a programming of the wrong operation. Component imprecision therefore has serious implications for the future of these systems; for example, beamsplitter variation as small as 2%, which is a typical wafer-level variance [19], has been shown to degrade accuracy by nearly 50% for feedforward circuits used to implement classifiers for the MNIST image recognition task [20]. Alternative programmable architectures, such as recirculating waveguide meshes consisting of triangular or hexagonal MZI lattices [13,21,22], are similarly susceptible to component-induced error; device variation within these circuits introduces errors that will alter the response of phase-sensitive filters [23]. These systems' degree of sensitivity to component variation makes their control challenging when scaling up to large numbers of modes.
Hardware errors are usually compensated for with numerical optimization. A number of global optimization approaches have been proposed in the past, including nonlinear optimization [24][25][26][27][28], gradient descent [29], and in-situ backpropagation and training for neural networks [30]. These strategies, however, are time-consuming and can scale poorly with circuit size. Moreover, it is often inefficient to retrain hardware settings for each individual chip. For many tasks, such as machine learning, model training is energy intensive; if the same model parameters are broadcast to thousands of chips within a data center, retraining the model for each chip with a unique set of component imprecisions will be very costly. One can instead employ progressive algorithms making use of local feedback [31,32]; however, these algorithms, which iteratively optimize the settings of one device at a time, require O(N 2 ) tap photodiodes to monitor the optical power within each individual interferometer. This requirement greatly increases the number of electrical lines and overall power consumption of the system. This focus on in-situ approaches reveals a critical roadblock for programmable photonics compared to electronic FPGAs. An FPGA does not optimize hardware settings in real time off readings taken directly from the chip; rather, control software takes it for granted that the logic gates are ideal and maps the requested function into a netlist that can be placed and routed within the chip. A similar capability for programmable photonics would  on the silicon-on-insulator platform is composed of two 50-50 splitters, implemented with directional couplers, and an external phase shifter φ and internal phase shifter θ implemented with thermo-optic phase shifters. These devices act as electrically-controlled 2 × 2 optical gates in programmable photonics. b) Arbitrary higherdimensional matrix operations can be implemented by connecting N(N − 1)/2 MZIs in a rectangular (top) or triangular (bottom) configuration. The Reck (triangular) and Clements (rectangular) decompositions [17,18] describe a procedure for computing the phase settings for each MZI, but they assume the components are ideal. c) A realistic MZI implemented on a photonics platform will have splitting errors α, β for the two directional couplers within the interferometer. The effect of these hardware errors is to leftand right-multiply each programmable 2 × 2 unitary T ij (θ, φ) implemented by an MZI by error matrices β ij , α ij (φ). Applying the standard decomposition for ideal components to these imperfect optical gates will not produce the correct gate operation.
greatly improve the scalability of these systems; if this were the case, a desired optical function could be trained once on an idealized software model and ported over to many chips. The challenge for programmable photonics is that unlike FPGAs, photonic circuits are analog systems that are far more sensitive to errors within individual components. Enabling this level of scalability will therefore require the ability to deterministically correct hardware errors in photonic chips.
If a unitary operation is realizable by an imperfect photonic circuit, it should not require optimization to deduce the required settings; rather, a small perturbation in the device behavior due to component deviation should translate directly to a small perturbation in the interferometer's phase settings to recover the original unitary. This insight has led us to consider a local approach that corrects hardware errors one at a time within each optical gate composing the circuit. In this article, we present an approach to directly correct hardware errors for a programmable photonic circuit. Our algorithm outperforms previous approaches in several key respects: 1) it is flexible, requiring only a one time device calibration to directly compute the hardware settings for any given unitary; 2) for sufficiently low hardware errors the computed settings yield the exact unitary desired; and 3) our approach requires minimal overhead and does not make use of additional interferometers or internal detectors within every device. Our analysis is focused on feedforward programmable circuits that implement arbitrary unitary matrices, as these systems have the most demanding requirements for fabrication precision. However, our approach is a local error correction strategy that individually corrects each 2 × 2 optical gate within the circuit. It therefore does not assume any particular structure to the circuit and can be generalized to any programmable architecture making use of interferometers, including feedforward circuits with redundant devices and recirculating waveguide meshes.

HARDWARE ERROR CORRECTION
Local error correction requires characterization of each phase shifter and passive splitter in the photonic circuit. The calibration is performed once with the results stored in a lookup table; any arbitrary function can then be programmed by computing the settings for an ideal set of MZIs and converting them, one by one, to the corresponding settings for an imperfect device. In the Supplementary Information (Sec. I.), we describe how to calibrate these parameters using detectors only at the circuit outputs; assuming these errors are known, we can proceed with error correction as follows.
The fundamental optical gate of a programmable photonic circuit is a 2 × 2 Mach-Zehnder interferometer (MZI) composed of an external phase shifter on one input, two 50-50 beamsplitters, and an internal phase shifter on one of the modes between the splitters (Fig. 1a). This device is an electrically programmable beamsplitter capable of performing a 2 × 2 unitary operation T ij (θ, φ) on optical modes i, j parameterized by the external phase shift φ and the internal phase shift θ.
Higher dimensional matrix operations can be implemented with this unit cell by applying the Clements [18] and Reck [17] decompositions (Fig. 1b). These algorithms decompose an arbitrary N-dimensional unitary U into a product of N(N − 1)/2 two-dimensional unitaries computed by interference between nearest-neighbor optical modes, followed by phase shifts on the output modes corresponding to a diagonal matrix D: We now analyze the impact of fabrication error. If the MZI has imperfect splitters with errors α, β, the operation of the MZI must now be parameterized with four variables T ij (θ, φ, α, β) ( Fig. 1c): In the limit α, β → 0, the second term of each entry in the matrix T ij (θ, φ, α, β) drops out and we recover the expected transformation for an ideal device. Naturally, implementing the usual decomposition on these imperfect devices will not yield the desired unitary: To program into an imperfect circuit a desired unitary U = ∏ T ij (θ, φ), we apply local corrections θ → θ , φ → φ to each device such that T ij (θ , φ , α, β) = T ij (θ, φ).
Figure 2a illustrates our approach. We begin by finding θ such that the magnitudes of the entries of T ij (θ , φ , α, β) equal those of T ij (θ, φ). This condition produces the following expression for θ (Supplementary Info., Sec. III.A.): Component errors restrict the range over which θ is physically realizable. The above expression has a solution only if sin 2 (θ/2) > sin 2 (α + β) and if sin 2 (θ/2) < cos 2 (α − β). This restricts θ to the range: If the matrix decomposition requires θ outside this range, the error is minimized by setting Assuming we can physically implement the required value of θ , the magnitudes of the elements of T ij (θ , φ , α, β) and T ij (θ, φ) are now the same, but each element of T ij will have an undesired extraneous phase ξ a , ξ b , ξ c , ξ d relative to the corresponding term in T ij that must be corrected. We can therefore rewrite T ij (θ , φ , α, β) as where the simplification in the second line originates from unitarity requiring that ξ a + ξ d = ξ b + ξ c . We correct the phase errors in T ij by setting φ = φ + ξ b − ξ a and by applying additional phases ψ 1 = −ξ b + (θ − θ )/2, ψ 2 = −ξ d + (θ − θ )/2 to the top and bottom output modes, respectively. Applying these corrections will set T ij (θ , φ , α, β) exactly equal to T ij (θ, φ). Expressions for the phase errors ξ a , ξ b , ξ d can be constructed by setting the complex arguments of the elements of T ij equal to those of T ij (θ , φ , α, β). From this, we find that: The errors θ − θ , φ − φ, ψ 1 , ψ 2 as a function of θ for an example MZI with two 52-48 (α = β = 0.02) splitters are shown in Figure 2b. While the corrections to θ and ψ 1 are small (∼ 0.1 rad), the errors for φ and ψ 2 are quite substantial. In particular, for low device reflectivities (θ ≈ 0), the phase corrections required can exceed 1 rad.
Generally, we cannot apply the auxiliary phases ψ 1 , ψ 2 locally to the device being corrected, since the output modes do not have phase shifters. In most cases, one of the two can be Fig. 2. a) Fabrication-induced errors within each MZI can be corrected by applying local corrections θ → θ , φ → φ to the device. We first correct θ to set the magnitudes of the elements of T ij equal to T ij . Once the amplitude terms are set correctly, we apply phase corrections to the input and outputs of the device to correct phase errors between T ij and T ij . b) The corrections φ − φ, θ − θ , ψ 1 , ψ 2 applied to an MZI with two 52-48 beamsplitters (α = β = 0.02). The arrows on the plot indicate which vertical axis each curve corresponds to. c) The procedure for programming a unitary with hardware errors on a 4 × 4 rectangular unitary circuit. We first program each MZI to the (θ, φ) setting obtained with the standard decomposition in [18]. Each MZI is then converted T ij → T ij to the settings for an imperfect device one column at a time. At each step we propagate the output phase shifts ψ 1 , ψ 2 forward in the circuit until the entire network is corrected. incorporated into the external phase shifter setting of an MZI in the subsequent column. The other phase can be applied by observing that: Using this fact, we can propagate the auxiliary phases forward, through all of the columns of the network, out to the phase shifters D located on the output modes of the circuit. This procedure, illustrated in Figure 2c, produces a modified output phase screen D such that: Depending on the component imperfections and the required value of θ, we may also be able to program θ such that |T ij (θ , φ , α, β)| = |T ij (θ, φ)| if the condition in equation (9) is satisfied. If every MZI in the circuit satisfies the condition in equation (9), we can recover the exact unitary desired. However, if some MZIs in the circuit cannot realize the required splitting, that exact unitary is not physically realizable by the device. In this case, correcting the phases φ , ψ 1 , ψ 2 and setting θ as close to the required value as possible minimizes the gate error T ij − T ij .
We can summarize the algorithm for programming of a matrix U as follows: 1. Calculate the required values for θ, φ assuming ideal components, using the procedure described by Reck [17] or Clements [18].
We have illustrated this procedure for the example of feedforward unitary circuits, but the same principles apply for other architectures. Each optical gate within any programmable circuit can be corrected to the required 2 × 2 unitary operation T ij with the aforementioned procedure. The expressions provided assume a specific form for the MZI (Fig. 1), but they can be easily modified to apply to other designs, such as the dual-drive tunable basic unit (TBU) used in recirculating architectures [33].

A. Hardware Performance
We analyzed the performance of error correction through numerical simulations of programmable photonic circuits with fabrication imperfections. Results were obtained with a custom simulation package written using NumPy [34]. Further details are included in the Supplementary Information. Figure 3a shows the matrix error (relative error per entry) = (∑ ij |U hardware,ij − U ij | 2 /N) 1/2 for 100 Haar random unitaries implemented on 100 randomly generated N = 32mode unitary circuits with mean beamsplitter transmission η = (50 ± σ BS )%. The beamsplitter errors are independently sampled from a Gaussian distribution; for large N, the distribution shape will not greatly affect the results. We find that error correction reduces significantly, sometimes by more than an order of magnitude. This improvement is larger for circuits with small splitting errors, as they are more likely to satisfy equation (9) and program the required θ for all devices within the circuit. However, even for circuits with large σ BS , where many MZIs may not be programmable to the required θ, the improvement in is substantial as all errors in φ, ψ 1 , ψ 2 can always be corrected.

Fig. 4. a)
We simulated the effect of component error on twolayer optical neural networks for the MNIST task. Matrixvector products are calculated optically in the photonic circuit, and modReLU-like activation functions are implemented electro-optically [37,38]. b) Median accuracy for 300 unitary circuits as a function of σ BS with and without correction for a photonic image classifier for the MNIST task with N = {36, 64, 144, 256} neurons. Error correction significantly improves the fabrication tolerance of the neural network to beyond current-day process tolerances, even for systems with hundreds of modes. As the inset shows, even circuits with 4% splitter error preserve the baseline performance within 1%.
In Figure 3b, we show with and without error correction for circuit sizes N = {64, 128, 256}. For these simulations, we chose a beamsplitter variation of σ BS = 2%, which is a typical waferlevel variance [19]. While the improvement in diminishes for larger N, we still find substantial improvement gained in our approach for up to 256 modes. For large unitary circuits most MZIs need to be programmed to reflectivities close to θ ≈ 0 [35]; the increasing fraction of devices that cannot be programmed to the required splitting accounts for the increase in with N. Nevertheless, there is always some improvement in , as any phase errors introduced by the components can be corrected. Our results suggest that substantial performance improvements can still be achieved by error correction for circuits with hundreds of modes, which is well beyond the size of the current state-of-the-art (N = 64) in programmable photonics [36].

B. Application: Optical Neural Networks on Feedforward Programmable Circuits
To further benchmark the performance of our error correction protocol, we applied this approach to simulations of a programmable photonic system, namely a two-layer neural network conducting inference with a feedforward programmable photonic circuit. The architecture of the neural network is similar to that studied in [16,20,32], where forward inference is optically computed through passive interference within a unitary photonic circuit coupled with an electrical or electro-optic nonlinearity [38]. Optical machine learning is a key application area for photonic error correction, as model training is both timeconsuming and energy intensive, making it impractical to retrain on each individual piece of hardware with a unique set of fabrication errors. Preferably, a model would be highly optimized once in software, after which corrections are applied within the hardware to restore the original software-trained model from any fabrication-induced errors.
The neural networks we benchmark are based on the architecture described in [32]. Using the Neurophox package, we trained two-layer neural networks with N = {36, 64, 144, 256} neurons to recognize low-frequency Fourier features of handwritten digits from the MNIST task. The activation function between layers was assumed to be a modReLU function implemented using an electro-optic nonlinearity [37,38]. Further details on the network architecture and training are included in the Supplementary Information. Figure 4 shows the median classification accuracy for 300 randomly generated circuits as a function of the beamsplitter statistics η = (50 ± σ BS )%. The smaller circuits (N = 36, 64) exhibit roughly 95 − 96% accuracy after training, while the larger circuits (N = 144, 256) exhibit a slightly higher model accuracy of ∼ 97%. The larger circuits, however, are less resilient to errors; without error correction classification accuracy drops to below 90% for all circuit sizes at a splitter variation as low as ∼ 3%.
Hardware error correction extends this cutoff to more than 6%, which is well beyond modern-day process tolerances [19]. Moreover, without correction classification accuracy drops significantly at even typical wafer-level variances (2%). However, with error correction there is almost no drop in accuracy at these variances and less than 1% accuracy loss for beamsplitter variations as high as 4%. We expect this margin for fabrication error will prove important as optical neural networks scale up. These results suggest that error correction in programmable photonics can enable high-accuracy neural networks of up to hundreds of modes within current-day process tolerances.

C. Application: Tunable Dispersion Compensators on Recirculating Waveguide Meshes
While our analysis has focused on feedforward programmable photonic meshes, our results can also be applied to recirculating architectures useful in RF and optical signal processing. These recirculating meshes, which are usually configured in hexagonal or triangular lattices, enable implementation of finite impulse response (FIR) and infinite impulse response (IIR) filters by configuring waveguides into asymmetric MZIs and ring resonators, respectively [13,21,22]. Unlike the feedforward architectures, the programming of these structures usually cannot be determined analytically and must be found through optimization [26][27][28]. Since optimization can be time-consuming for complex systems, error correction can enable optimizing these circuit parameters on idealized models and then porting them over to hardware without retraining. As an example, we simulated the performance of an IIR filter functioning as a tunable dispersion compensator (TDC) on a hexagonal waveguide lattice [22]. TDC modules are of interest for numerous applications, including compensating chromatic dispersion in optical communication links [39] and enabling high-dimensional quantum key distribution (QKD) with temporal modes [40]. We implemented the TDC using an architecture similar to the tunable-coupling ring array described in [14]. Programmable Top: Simulations of a tunable dispersion compensator (TDC) implemented on a recirculating waveguide mesh with 15 tunable-coupling ring resonators coupled serially to one another. Bottom: After training the mesh parameters to implement a fixed linear group delay dispersion on an ideal model, small beamsplitter errors will introduce variations in the implemented group delay τ profile. Plotted are the group delay profiles for 500 randomly generated circuits before and after correction. Correcting the settings of each TBU restores the desired performance, eliminating the need to retrain on the hardware. Also displayed is the distribution of the group delay dispersion before and after correction. dispersion is achieved by individually tuning the coupling and resonance of each ring in a chain of 15 resonators coupled serially to one another. Each ring is implemented with a single MZI (often referred to as the tunable basic unit, or TBU) in a hexagonal mesh acting as the coupler, while five other TBUs are programmed to the bar state to implement feedback. For simplicity we do not simulate routing within the hexagonal mesh, but instead simulate the transfer function of each individual filter implemented using TBUs with fabrication imperfections. Using the constrained optimization by linear approximations (COBYLA) routine in SciPy [41,42], we trained the TBU parameters on an idealized model to implement a group delay dispersion of −85 ps/nm over the bandwidth of a 50 GHz ITU channel. Figure 5 shows the group delay τ profiles for 500 randomly generated TDC modules implemented using TBUs with σ BS = {2, 4}% before (top) and after (bottom) error correction. Similar to optical neural networks, precise implementation of a TDC requires accurate phase control throughout the circuit. Fabrica-tion errors introduce spurious phases at each resonance, which results in significant variation of the dispersion profile for even slight component errors. As our results show, correcting the parameters of each TBU locally is sufficient to restore the desired dispersion profile.
While we can correct the coupling and phase parameters for each ring, errors α, β also introduce some loss at each TBU programmed to the bar state, as the bar transmission is reduced to cos 2 (α − β). The remainder of the light is directed into unused couplers in the circuit, effectively incurring loss. This alters the critical coupling condition, which results in the slight variation in the dispersion profile observed for σ BS = 4%. Our simulations assume α, β are independent, Gaussian random variables; in practice, however, α, β for a single device are strongly correlated [43,44]. Therefore, our simulations likely overestimate the loss incurred at each TBU programmed to the bar state.

D. Scalability and Outlook
We have presented an approach for characterizing and correcting for hardware errors in programmable photonic circuits. To conclude, we analyze the expected improvement our technique enables and how it will perform as these circuits scale up.
For a unitary photonic circuit, applying the Reck or Clements decompositions produces an average matrix error of (Supplementary Info., Sec. II.A.): If we can correct all errors in θ, then corrected → 0. We can therefore estimate the expected corrected by computing the fraction of MZIs that cannot be programmed to the required splitting value, i.e. the condition in equation (9). The distribution of phase shifter settings for a unitary circuit can be related to the Haar measure on the unitary group [35]. The probability an MZI is programmed to a value θ < ξ is (Supplementary Info., Sec. II.C.): We disregard the probability an MZI is programmed to a splitting θ > π − 2|α − β|, which is negligibly small for large N [35]. Error correction cannot fix the splitting error if θ < 2|α + β|; therefore: We find that error correction effectively reduces the hardware error from to ≈ (1/ √ 6) 2 . The expected error improvement is: and corrected as a function of N are plotted in Figure 6a. We consider σ BS = 1.2%, which is the state-of-the-art reported in [19], as well as more relaxed tolerances σ BS = {2, 4}%. For σ BS as high as 4%, error correction produces at least a factor of two (and often more) improvement in the error for circuits as large as N = 500. We therefore expect our approach to have , corrected as a function of circuit size N for σ BS = {1.2, 2, 4}%. b) Average circuit error as a function of wavelength for N = {64, 128, 256} using the optimal directional coupler design in [19]. c) Redundant MZI for implementing perfect optical gates. One of the two beamsplitters is an MZI that can be tuned to implement an error α(θ α ) which compensates for the error β.
wide applicability in the near term as the size of programmable photonic circuits scale up.
Error correction also greatly improves the optical bandwidth of unitary circuits. Since directional couplers are highly wavelength sensitive, dense wavelength-division multiplexing (DWDM) requires re-fabricating the same circuit with components optimized at each wavelength channel. Our approach, however, enables the use of the same hardware across a wide wavelength range. In Figure 6b we show the expected hardware errors for large circuits across a 100 nm bandwidth using the optimal splitter (σ BS = 1.2%) design in [19]. We find that the corrected error for an N = 256 circuit across a 60 nm bandwidth (1520-1580 nm) will be lower than the uncorrected error at the design wavelength λ = 1550 nm. Even lower errors could be achieved using multimode interference (MMI) couplers; these devices have large bandwidths but often suffer from static splitting imbalances [45], i.e., α, β are invariant to wavelength, but α , β = 0. A circuit with large-bandwidth MMI couplers can thus use error correction to achieve a large instantaneous bandwidth, for instance to compute over many parallel wavelength channels.
The results in Figure 6a suggest a fundamental error bound achievable with local correction for unitary circuits. Our approach yields comparable results to those achieved with self-configuration procedures [9,32] but does not require a specific structure for the circuit or photodiodes within each device. If the condition in equation (9) is satisfied, local correction obtains corrected = 0 in O(1) time. If this condition is not satisfied, it is sometimes possible to achieve a larger reduction in error with a global optimization approach [24,29]. However, these approaches, which require photodiodes within each device or output measurements whose number scale nonlinearly with the number of modes, become increasingly inaccessible experimentally as N scales up. Local correction requires minimal overhead and can guarantee a minimum error given certain guarantees on the component performance, making it ideal for standardizing performance across large numbers of chips.
Moreover, this error bound applies only to feedforward, unitary circuits with no redundant devices. lower than this bound can be achieved by incorporating additional, redundant MZIs; for instance, one can implement "perfect" optical gates by incorporating an additional phase shifter into the MZI, as shown in Figure 6c. This device can be trained with optimization to implement any desired unitary T ij (θ, φ) perfectly [46,47]. The error correction formalism enables calculation of these settings analytically. One of the two constituent splitters is a passive component with error β, while the other splitter is an MZI that implements a tunable error α(θ α ). Any desired 2 × 2 unitary with a required splitting θ can then be implemented by setting θ α such that 2|α(θ α ) + β| < θ < 2|α(θ α ) − β| and correcting the resultant phase errors (Supplementary Info., Sec. III.B.).
Not all optical gates within the circuit necessarily need to incorporate redundancy. High accuracy unitary circuits have been demonstrated by incorporating only a few extra MZIs into the circuit, which can be trained using nonlinear optimization [24] or gradient descent [29]. Error correction serves an important purpose for these circuits, as one can optimize the hardware settings once on an ideal model and port the settings over to many devices. For recirculating meshes the phase shifter settings are not constrained by the Haar measure, and so the benefit gained from error correction is not expected to diminish with increasing N. We therefore expect error correction to play an important role in scaling up the size of these circuits as well.
The motivation for photonic error correction assumes the hardware is re-programmed infrequently, for instance to implement a weight matrix in a neural network. Other applications, such as mode unscrambling, require real-time configuration robust to device error. We have recently discussed error-resilient self-configuration approaches in [48,49].

CONCLUSION
In conclusion, we have presented a protocol to correct for hardware errors in programmable photonic circuits. Unlike optimization-based approaches, our protocol utilizes a one-time calibration procedure to flexibly implement any desired functionality up to the limits of the hardware. We find that applying our approach to key application areas of programmable photonics, such as optical neural networks and programmable coupledring systems, enables resilience to fabrication errors well beyond modern-day process tolerances. Error correction also greatly reduces the overhead for programmable photonics that require optimization to deduce the hardware settings, as it eliminates the need to retrain for each individual set of hardware with unknown fabrication errors. Current process tolerances suggest that our approach enables improved functionality for systems of up to hundreds of modes, providing a new avenue for scaling up programmable photonics.

HARDWARE CALIBRATION
Error correction requires a one-time calibration of each phase shifter and passive splitter in the photonic circuit. While characterization of the overall linear transformation U performed by a circuit is fairly straightforward [1], the lack of direct access to individual optical elements makes measurement of their characteristics quite challenging. In order to enable our approach to hardware error correction, we have developed a protocol to calibrate all components on chip with simple interference measurements and homodyne detection on the circuit outputs. Importantly, our approach yields the circuit parameters directly from the measurements and does not rely on detectors embedded within the circuit. In this section, we illustrate our protocol for a rectangular (Clements) unitary circuit [2]; however, our approach can be readily applied to any arbitrary network of MZIs.
A key consideration for our protocol is that the finite extinction ratio of the MZIs causes small amounts of spurious, unwanted light to scatter randomly into each device. A device is usually characterized by inputting an optical signal E 1 into one device port and measuring the transmission as θ, φ are varied. However, characterizing a device in the middle of the network requires routing the optical signal through other interferometers, each of which are programmed to cross (θ = 0) or bar (θ = π) configurations to route the light through wire paths within the circuit (Fig. S1a). If the routing MZIs are ideal, then cross or bar settings will direct the input probe signal E 1 into the desired output. Imperfect devices, however, are unable to realize ideal cross or bar configurations; as a result, a small amount of light sin(α ± β)E 1 exits the unwanted output wherever light interacts with a device in the optical path. These spurious signals scatter randomly throughout the network; as a result, any device being characterized with a signal E 1 into one input will also have a small, unwanted signal E 2 = ζE 1 (ζ 1) incident upon the other input. If unaccounted for, this extraneous light introduces an error O(ζ) in the calibration protocol which can be on the order of the beamsplitter errors being corrected.
One strategy for isolating the probe signal during characterization is to apply a low-frequency modulation to the devices the signal is traveling through [3], which enables spurious light traveling through unwanted devices to be eliminated in the Fourier transform of the output. In this section, we present an alternate strategy applicable to devices where the relative phase of the two inputs are controllable with a phase shifter.

A. Device Calibration
The procedure for calibrating a single MZI is sketched in Figure S1b. Calibration does not require internal detectors at each MZI, but does assume coherent detection at the outputs capable of reconstructing the field amplitude and phase.
We start with devices connected directly to the detectors at the circuit output, which in a rectangular unitary circuit corresponds to the devices in the last column. Light is input into the circuit and each MZI within the path is optimized to maximize the signal input into the device of interest. The precise amount of light incident upon the device being characterized is not important, only that enough light reaches the MZI to produce measurements above the detector noise floor. We assume the input signal vector to the MZI is of the form x = E[1, ζe iψ ], where E is not known and ζ 1 is an unknown scaling factor indicating the amount of spurious light entering the other input. θ calibration: The first step is to calibrate the internal phase shifter θ by sweeping the voltage applied to the phase shifter and measuring the output transmission. We input a strong signal into the top input of the device; due to device errors, a small, unknown input with relative phase ψ and amplitude |ζ| 1 will also be incident upon the bottom input.
The power exiting the top output port is proportional to: If |ζ| = 0, we can calibrate θ by observing that P top is minimized at θ = 0. However, if |ζ| = 0, there are contributions to P top dependent solely on φ and jointly on both θ, φ; as a result, optical power is minimized at: Simply optimizing P top with respect to θ would therefore produce a calibration error on the order of O(ζ).
We can get around this by observing that if we average P top over all values of φ, this measurement is always minimized at θ = 0: θ is therefore calibrated by constructing the two-dimensional transmission characteristic P top (θ, φ) and optimizing the average transmission over all settings for φ. The phase shifter setting for θ = π can similarly be obtained by maximizing this measurement, and arbitrary phase settings can be found by fitting this expression to the measured transmission.

α, β calibration:
We can now use this information to program with high fidelity the bar (θ = π) or cross (θ = 0) settings into the MZI. These settings correspond to the unitaries: For ideal devices α = β = 0 these unitaries reduce to identity and swap operations, respectively.
The beamsplitter calibration is now performed by sending roughly equal amounts of light into both inputs, i.e. applying an input field vector x = E[1, ζe iψ ] where ζ ≈ 1 but once more the precise scaling factor is unknown. This can be achieved by either inputting coherent light into two inputs of the circuit, or by inputting light into one port and programming an MZI earlier along the wire path to operate as an approximate 50-50 beamsplitter.
We first set θ = 0 and measure the photocurrent I top , I bottom at both outputs as a function of the external heater φ: where R top , R bottom are the unknown responsivities of the photodetectors. This measurement produces a modulation of the photocurrent as the relative phase φ − ψ between inputs (controlled by φ) is varied. The interference visibilities ∆ = (I max − I min )/(I max + I min ) for the top and bottom outputs are: ∆ bottom,θ=0 = |ζ sin(2(α + β))| 1 + (|ζ| 2 − 1) sin 2 (α + β) (S9) Solving this system of equations will yield values for ζ and |α + β|. Fig. S1. a) Devices are characterized by programming "wire paths" into the circuit, where each MZI along the route is set to the cross or bar state. However, device errors along the path scatter spurious light throughout the circuit, introducing errors into calibration. b) Device calibration is conducted in four steps. First, an optical signal is sent into one port of the MZI. Optimizing the output optical power averaged over φ produces an accurate calibration for θ. The second step is to input optical signals into both ports and modulate the phase of one input while setting θ to the cross and bar states to measure |α + β|, |α − β|. This procedure has a sign ambiguity that is resolved in the third step, where the MZI is programmed to act as a 50-50 beamsplitter and output power vs. φ is optimized to set the two inputs into phase. Resetting to the cross/bar states and driving φ then allows us to deduce the signs of α, β. The final step is to set the two inputs into phase and use coherent detection to measure the difference in the phase of the output field between cross and bar states, which provides a calibration for φ. c) Circuit calibration is completed from the output working backwards to the input. For devices in the middle of the network homodyne detection can be used to infer the output fields at any device of interest.
This procedure characterizes how much the two input modes mix through interference when the MZI is set to the cross and bar states. In an ideal device, the bar and cross configurations implement identity and swap operations, inhibiting interference between the input modes. Any observed interference is therefore the product of beamsplitter errors within the MZI. Inputting roughly equal amounts of light into both inputs (ζ ≈ 1) maximizes the interference visibility, which has the advantage of being insensitive to detector responsivity and out-coupling loss.
The final step is to deduce the sign of |α + β| and |α − β|. To do this, we set θ = π/2 and tune φ to maximize power exiting the top port, which occurs when φ ≈ ψ. Having identified the external phase shifter setting corresponding to ψ, we can now reset back to the cross state. If I top increases (decreases) when the phase shifter voltage is increased, then α + β is negative (positive). The procedure is repeated for the bar state to determine the sign of α − β. These measurements provide sufficient information to compute α, β exactly. Fig. S2. The simplified calibration procedure for a 4 × 4 Reck (triangular) circuit. Since each device is located along a diagonal, we can guarantee ζ = 0 and extract α, β from direct extinction ratio measurements. Once the first diagonal U 1 is calibrated, we can program it to operate as an effective homodyne detector. Any other device in the circuit can be calibrated by interfering the output with U 1 .
φ calibration: Precise calibration of φ requires measurement of the output field phase with coherent detectors. We input a strong optical signal into both ports, program the MZI to the cross state, and tune φ to precisely set φ = ψ (equations (S6), (S7), after setting φ ≈ ψ using the procedure above). The phase of the signal exiting the top output is: Now set the MZI to the bar state and measure the output phase once more. We obtain: (S13) Solving this system of equations provides ψ; using this information, we can now program any arbitrary phase φ.

B. System Calibration
This procedure can be used to characterize θ, φ, α, β for all MZIs connected directly to photodetectors. We can therefore directly obtain the unitary U 1 corresponding to the last column of the circuit (Fig. S1c). With this information, we can now directly obtain the fields exiting an MZI in the preceding column U 2 by using homodyne detection to reconstruct the output field amplitude vector y; the fields exiting an MZI in the penultimate column can be back-calculated to be U † 1 y. The characterization therefore proceeds one column at a time, starting from the output side and working backwards towards the input. Homodyne detection allows direct measurement of fields exiting any MZI in the network; for an MZI in column k, the fields exiting that column will be ∏ 1 k−1 U † i y. Each device is calibrated as before; however, instead of directly measuring photocurrents I 1 , I 2 , the output fields at each device are inferred with homodyne measurements of y.
While we have illustrated the calibration protocol with a rectangular mesh, our approach can be applied to any arbitrary network of MZIs. The generalized procedure is to first characterize all devices directly connected to the output detectors. Using this information, MZIs one device removed from the outputs can then be characterized. This enables calibration of MZIs two devices removed from the outputs, and so on, until all devices within the circuit are characterized.

C. Triangular Network Calibration
This approach can be applied to any general network of interferometers. However, the symmetry of triangular (Reck) circuits enables a greatly simplified calibration procedure described here. In particular, only direct detection of intensities is required to calibrate the Reck circuit, rather than homodyne detection.
The simplified procedure is depicted in Figure S2. Each diagonal of the circuit can be divided into sub-blocks U 1 , U 2 , ..., U N which are characterized in order. For each block, the MZIs are characterized starting at the end of the chain and working backwards.
The first device T 11 is characterized by inputting light into the first port of the circuit. Both outputs are directly connected to detectors, and the triangular structure of the network ensures that no light scatters into the bottom input, i.e. ζ = 0. This simplifies the procedure in several aspects: • The phase shifter θ can be calibrated by directly optimizing transmission vs. θ, rather than having to first average transmission over φ.
• Sweeping transmission vs. θ and computing the extinction ratio for the bar and cross ports gives the following expressions, which can be directly solved to find |α ± β|: The signs of α, β can be determined interferometrically with the same approach as used in the generalized protocol.
The second MZI characterized T 12 has the top port directly connected to a detector, while the output fields of the bottom port are determined by undoing the known operation T † 11 . For the third device T 13 , the fields exiting the bottom port are computed using T † 11 , T † 12 , and so on for the first diagonal. Once the first diagonal is characterized, it can be programmed as a homodyne detector for the remainder of the circuit calibration. This is achieved by inputting a local oscillator field ae iψ into the first port and programming U 1 to distribute equal power to all of the MZIs. Suppose we wish to measure the fields x 2 , x 3 , ..., x N exiting U 2 . Upon programming U 1 , the fields exiting the circuit are U 1 (a + x) = U 1 ([ae iψ , 0, 0, ..., 0] T + [0, x 2 , x 3 , ..., x N ] T ). Since U 1 is programmed to distribute the LO signal equally to all outputs y i , i.e. U 1 a = (ae iψ / √ N)[1, 1, ..., 1] T , the field intensity |y i | 2 at any port i will be: Taking measurements at ψ = 0, π/2 will extract the in-phase and quadrature components of U 1 x. This approach enables measurement of field amplitudes anywhere within the circuit; using it, we can characterize the remainder of the circuit U 2 , U 3 , ..., U N with intensity measurements only.

A. Beamsplitter Errors
The hardware error between a desired unitary matrix U and the implemented matrix U hardware can be quantified by the Frobenius norm: This metric, which is bounded ∈ [0, 2], can be interpreted as an average relative error per entry of the matrix U; for example, in a neural network would correspond to the average relative error per weight.
Unitary circuits decompose arbitrary matrices into a product of unitary matrices T ij (θ, φ, α, β): The matrix error induced by a single beamsplitter error α can be computed as: The Frobenius norm is unitarily invariant, which originates from the cylic property of the trace; thus, only the unitary matrix corresponding to the beamsplitter error needs to be considered in the calculation of : Repeating this calculation for β yields the same result.
In a unitary circuit with N(N − 1)/2 interferometers, the average error is therefore: Figure S3a shows the expression in Equation (S29) plotted against simulation results; they show excellent agreement with the derived expression.

B. Phase Errors
Our analysis in the main text focuses primarily on beamsplitter errors. There can also be errors in the phase shifter settings; however, the primary source of these errors is a static error originating from microscopic changes in waveguide geometry between the interferometer arms [4]. This static error is calibrated out in the first step of the characterization protocol.
This calibration cannot account for dynamic errors, however. Potential sources of dynamic phase errors include thermal drift, thermal crosstalk between phase shifters, and quantization error. In this section, we show that the contribution of these effects to the hardware error is significantly smaller than the static errors considered in the main text.
To start, we find that any error ∆ induced in a single phase setting by these effects can be computed to be: We now consider the error induced by each of these effects.
a temperature gradient of < 0.01 • C therefore induces a phase error of 2π(dn/dT)(∆T)L/λ ≈ 1.5 × 10 −3 at λ = 1550 nm, which is an order of magnitude smaller than the expected beamsplitter error.
Thermal crosstalk: Thermal crosstalk is largely deterministic and dominated by the nearest-neighbor crosstalk, which can be accounted for in the phase shifter characterization. Additionally, crosstalk can be suppressed by spacing interferometers sufficiently apart on the chip [7]; a spacing of 135 µm, for instance, has been measured to generate a crosstalk with the neighboring MZI of less than 0.02 rad/rad [8]. Since thermal crosstalk decays with increasing separation, we expect with careful design this effect should not dominate hardware error.
Quantization error: Quantization error originates from the digital-to-analog converters (DACs) used to program voltages into the phase shifters. Consider an N-bit DAC whose 2 N codewords range from zero voltage to the voltage V 2π required for a 2π phase shift. Programming the M-th (0 ≤ M ≤ 2 N − 1) codeword will produce a voltage sampled uniformly over the distribution: In a thermo-optic phase shifter, relative phase is a function of the voltage squared; the phase setting for the M-th codeword is therefore: The uncertainty in φ is maximum at M = 2 N − 1, where the phase setting is: which is one fewer bit of accuracy than for the voltage setting.
The square-law dependence of phase on voltage therefore results in an N-bit DAC setting the phase to roughly N − 1 bits of accuracy. A 12-bit DAC will suppress worst-case quantization error per phase shifter to ≈ 9 × 10 −4 , and 16 bits are sufficient to suppress error to below 6 × 10 −5 .
In Figure S3b we plot the relative error contributions of these effects compared to static beamsplitter error. These estimates suggest that uncorrected component imprecision dominates the hardware error in programmable photonic circuits. However, once component errors are corrected, dynamic effects play a more significant role in the total hardware error.

C. Calculating the corrected error
As discussed in the main text, if θ , φ are realizable for all devices in a circuit, then corrected = 0. For large circuit sizes N, however, some devices will require a splitting θ outside the range of realizable values 2|α + β| < θ < π − 2|α − β|.
Consider a device for which we can correct φ, ψ 1 , ψ 2 , but are unable to correct θ. Any unitary U can be decomposed into a product of matrices U = D ∏ T ij , where D is diagonal and T ij is a N × N block matrix with non-trivial entries: An error θ → θ + ∆ produces a contribution to corrected of: On average, given θ cannot be realized, ∆ 2 = 2( α 2 + β 2 ) = 4σ 2 BS and the error per device will be 2 (∆) = 2σ 2 BS /N. The total error for the circuit is therefore: where P(θ < 2|α + β|) is the probability that a device in the circuit needs to be programmed to a splitting that cannot be realized.
The distribution of internal phase shifter settings θ for a unitary circuit can be determined from the Haar measure. For a given MZI, ref. [9] shows that: where n ∈ [2, N], i ∈ [1, N − n + 1] are indices denoting the position of the MZI in the network (see [9] for the mapping). The distribution of θ over the entire circuit can therefore be written as (Fig. S3c): Integrating this expression yields the fraction of beamsplitters with a required splitting below ξ: For small device errors, equation (S45) can be Taylor expanded to: On the other hand, the probability that θ > π − 2|α − β| is: For moderately large N, this quantity is order of magnitudes smaller than P(θ < 2|α + β|); we can therefore disregard it when estimating the average corrected error (Fig. S3d).
The average corrected error is therefore: This expression is plotted in Figure S3a and also shows excellent agreement with simulation results.

A. Correcting the internal phase shifter
In this section, we derive equation (9) in the main text providing the correction to the internal phase shifter θ for an imperfect device. Programming an MZI with phase settings (θ, φ) produces the unitary:   e iφ sin(θ/2) cos(θ/2) However, an MZI with splitting errors α, β implements the unitary T ij (θ, φ, α, β): The correction θ → θ can be derived by requiring that the magnitude of the upper left entry of T ij (θ , φ , α, β) equal that of T ij (θ, φ). For a 2 × 2 unitary matrix U, the unitarity condition UU † = I implies that setting the magnitudes of one term in both matrices to be equal is sufficient to set the magnitudes of all terms in the matrices to be equal. This condition produces an expression relating θ to θ: Solving for θ , we find that: Since α, β are small, the denominator of the expression for θ will always be positive. This expression therefore has a solution only when the numerator is positive, i.e. sin 2 (θ/2) > sin 2 (α + β), and when the argument in the arcsin function is less than 1, i.e. sin 2 θ/2 − sin 2 (α + β) < cos 2 (α − β) − sin 2 (α + β). These conditions yield the range over which θ is physically realizable:

B. Perfect optical gates with redundant devices
Device imperfections limit the range of realizable θ values. For unitary circuits this results in a net increase of with N, even with error correction, as more MZIs cannot be programmed to the required splitting. For recirculating waveguide meshes these errors will degrade the fidelity of the bar and cross configurations used to route signals, which induces unwanted crosstalk between systems.
An ideal optical gate can be realized with the redundant MZI shown in Figure S4 [10,11]. One beamsplitter is a passive splitter with error β, while the other is an MZI that implements a tunable error α(θ α ). The tunable splitter is assumed to consist of passive splitters with error a 1 , a 2 .
By setting α(θ α ) = −β, we can implement any arbitrary splitting 0 ≤ θ ≤ π − 4β. Alternatively, we can set α(θ α ) = β to implement any desired splitting 4β ≤ θ ≤ π. We can then correct for any phase errors using the usual procedure; thus, this interferometer can implement any arbitrary 2 × 2 unitary and a unitary circuit composed of these devices can always achieve corrected = 0.

METHODS
The results presented in this paper were produced using a custom simulation package written in Python. The package simulates photonic circuits with the transfer matrix method and relies primarily on efficient array calculations with NumPy [15] and optimization routines included in SciPy [16]. In this section, we provide further details on the application results presented in the main text.
Optical neural networks: The optical neural networks discussed in the main text were trained using the Neurophox package. The neural network architecture, shown in Figure 4 of the main text, is based on the architecture described in [13] with some modifications. Images of handwritten digits from the MNIST task are pre-processed with a Fourier transform and truncated to a √ N × √ N center window for a dimension N unitary circuit. We assume a fixed amount of optical power is available to the circuit; each input vector corresponding to an image is normalized to unit length, so that all images are encoded into the neural network with the same amount of optical power. This normalization can be realized optically with a diagonal line of MZIs, as depicted in Figure S5a.
The activation function is realized electro-optically with a tap photodiode coupled to a Mach-Zehnder modulator [14] ( Fig. S5b). The activation function taps off 10% of the input power to the photodiode, while the remainder is directed to the modulator. The photocurrent drives the modulator through a transimpedance amplifier (TIA), resulting in a nonlinear modulation of the electric field.
The nonlinearity implements the activation function [14]: f (E) = ( √ 1 − α)e −i(g|E| 2 /2+φ/2−π/2) cos(g|E| 2 /2 + φ/2)E (S74) where α = 0.1 is the fractional power tapped off to the photodiode and g = π/20 is the modulator phase induced when 1 mW is incident upon the nonlinearity (prior to the tap). For typical electro-optic modulator drive voltages of < 8 V [17,18] and a photodiode responsivity of 1 A/W [19], the required TIA gain for these parameters is roughly 36 dBΩ. The modulator is biased so that no transmission occurs when E = 0; as shown in Figure S5c, for optical powers < 20 mW f (E) approximates a modReLU function [20]. . S5. a) The MNIST data set was pre-processed with a Fourier transform and truncated to a √ N × √ N center window for a N-mode unitary circuit [13]. The inputs were normalized to unit length, which can be realized optically with a diagonal line of MZIs. b) The activation function architecture as described in [14]. A small fraction α of the input signal is tapped off to a photodiode driving a Mach-Zehnder modulator. c) The activation function f (E) for the parameters used in the simulation. Since the hidden layers operate on electric field amplitudes, we plot the square root of the optical power in units √ mW. Technically, f (E) is non-monotonic for high optical powers, as the Mach-Zehnder interferometer will produce a cos(|E| 2 ) modulation. However, the input optical powers in our simulations are chosen to ensure the activation function operates only in the modReLU-like region.

Fig
As the network size N increases, the average power within a waveguide drops as 1/N; for this reason, we assumed the total optical power input into the circuit increased commensurately to ensure the activation function could still be triggered. The N = {36, 64} networks were trained with 20 mW of optical power, the N = 144 network was trained with 40 mW, and the N = 256 network was trained with 60 mW of optical power. All of the neural networks were trained to minimize the mean squared error between the L 2 normalized output power and the one hot encoding of the correct image.

Tunable dispersion compensators:
The tunable dispersion compensator was modeled with 15 serially coupled optical ring resonators implemented within a hexagonal waveguide mesh ( Fig. 5; main text).
The transfer function T i (ω) for each ring was individually computed and multiplied to yield the overall system response T(ω) = ∏ i T i (ω). From this result we found the group delay of the system τ(ω) = −d/dω[arg T(ω)]. The group delay dispersion was calculated with a least squares linear fit to the group delay profile.
The phase shifter settings were trained by minimizing the mean squared error between the realized and desired group delay profiles using the COBYLA optimization routine in SciPy [16,23].

Unitary circuit bandwidth:
The wavelength dependence of the directional coupler design used in Figure 6b of the main text was computed with the MIT Photonic Bands (MPB) package [24]. Using MPB, we calculated the effective index difference ∆n(λ) between the first even and odd supermodes as a function of wavelength (Fig. S7). Assuming the directional coupler is designed to operate as a 50-50 splitter at λ 0 , the wavelength-dependent cross coupling is Fig. S6. Model for tunable coupling ring. The ring coupling is set by an MZI with errors α, β and internal phase θ, and the resonance is set with a phase setting φ. The coupler is assumed to be lossless, and the feedback loop is assumed to have a round-trip transmission a.   S7. Wavelength vs. cross coupling for the optimally tolerant directional coupler design (w = 400 nm; g = 400 nm; t = 150 nm; h = 220 nm) in [26].