Linear programmable nanophotonic processors

NICHOLAS C. HARRIS, JACQUES CAROLAN, DARIUS BUNANDAR, MIHIKA PRABHU, MICHAEL HOCHBERG, TOM BAEHR-JONES, MICHAEL L. FANTO, A. MATTHEW SMITH, CHRISTOPHER C. TISON, PAUL M. ALSING, AND DIRK ENGLUND* Lightmatter, 61 Chatham St 5th floor, Boston, Massachusetts 02109, USA Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139, USA Elenion Technologies, New York, New York 10016, USA Air Force Research Laboratory, Information Directorate, 525 Brooks Road, Rome, New York 13441, USA *Corresponding author: englund@mit.edu


INTRODUCTION
Photonic integrated circuits (PICs) have become increasingly important in classical communications applications over the past decades, including as transmitters and receivers in long-haul, metro, and data center interconnects. Many of the attributes that make PICs attractive for these applications-compactness, high bandwidth, and the ability to control large numbers of optical modes with high phase stability-also make them appealing for entirely new applications, such as hardware accelerators based on emerging classical and quantum computing concepts. However, these emerging applications come with highly demanding device and scaling requirements. For example, proposed optical matrix processors will likely require the control of at least hundreds of spatial modes to be useful as neural network hardware accelerators [1][2][3], optical quantum computing protocols may require similar numbers of optical modes for each logical quantum bit (qubit) [4][5][6], and quantum computing schemes based on atomic memories will also require high-performance control over large numbers of optical spatial modes [7][8][9]. In addition, many of these emerging applications will require new types of devices, such as extremely low-loss modulators, and may need to function at wavelengths outside the standard telecommunications band. Whereas these challenges may have appeared daunting a decade or two ago, rapid advances in PICs have recently enabled proof-of-concept demonstrations. Silicon-on-insulator (SOI), silicon nitride, and indium phosphide (InP) technology has, in many areas, led the way thanks in large part to the availability of mature fabrication processes and multi-project-wafer (MPW) services [10][11][12]. Recently, SOI PIC systems that can coherently control tens of optical modes have been demonstrated [1]. Crucially, it was shown that even though all individual photonic components are imperfect, nearly perfect mode transformations become possible in sufficiently large reconfigurable optical devices [13,14]-indicating that scaling optical systems while mitigating errors is feasible. Among reconfigurable optical systems, there has been much progress towards "universal linear optics" devices: photonic circuits that can be programmed to perform all possible linear optical transformations on a given set of input modes [15,16]. This paper will review progress towards such general-purpose "programmable nanophotonic processors" (PNPs) and emerging applications to problems including machine learning and quantum information processing. The PNPs considered here implement linear optical transformations by one-way propagation; we assume no resonators or other feedback loops, which are important for a number of applications, including RF filtering [17][18][19][20].

PROGRAMMABLE NANOPHOTONIC PROCESSORS
The most popular methods for constructing a programmable mode transformer from N input to N output modes break the problem up into a mesh of 2 × 2 mode transformers consisting of Mach-Zehnder interferometers (MZIs) [13,15,21], as shown in Figs. 1(a) and 1(b). Each MZI consists of two 50% beam splitters and two phase shifters parameterized by θ, ϕ, as shown to the right of Fig. 1(b). In integrated photonics platforms, beam splitters are commonly realized by directional couplers that convert input modes a 1 , a 2 into output modes b 1,2 1 ffiffi U 2 e iϕ sinθ∕2 e iϕ cosθ∕2 cosθ∕2 − sinθ∕2 , up to a global phase. Here we assume the unit cell is lossless; accounting for losses requires each MZI to be described by a 4 × 4 matrix, rather than the 2 × 2 matrix considered here. Losses can be modeled by "virtual" beam splitters coupling the original mode and a "vacuum" mode. If such virtual beam splitters are included, the overall transformation can still be represented as a unitary U M , where M > N accounts for the additional loss channels. The N × N transformation that applies to our input and output waveguide modes then comprise a nonunitary submatrix of U M . In instances where the loss in each component is identical, it is possible to represent the PNP transformation by the unitary U N and to account for loss as a global parameter α ≤ 1 that can be factored out as V N αU N . Experimentally, waveguide losses have been shown to be relatively uniform, so that losses can likely be assumed to be uniformly distributed [10]. In this case, the scattering statistics for q identical, single photons passing through the PNP are described by U N , and the probabilities of all photons arriving at the output will scale as α q . For a universal unitary transformation, each of the N input modes must be coupled to each of the N output modes. Figure 1(a) shows an arrangement of MZIs connecting N 6 modes. To allow connections between all modes, one requires Σ n N N − 1∕2 (N choose 2) MZIs-15 MZIs for this example. The triangular arrangement of Fig. 1(a) was first proposed by Reck et al. [22]. Figure 1(b) shows a more compact arrangement, described by Clements et al. [21], that accomplishes the same U N transformation; it also requires 15 MZIs for N 6 modes. Both the "Reck" and "Clements" decomposition algorithms terminate with a matrix that implements U N up to a diagonal phase screen. The phase screen can be implemented using phase shifters at each input mode, as shown in Figs. 1(a) and 1(b). A cascaded binary tree structure [23] that can implement arbitrary unitary transformations has also been proposed.
The network shown in Fig. 1(c) was originally proposed by Miller as a method for realizing any linear transformation on a set of spatial modes [16]. This network uses a physical instantiation of the singular value decomposition, which is a factorization of any matrix (M ) as M U ΣV † , where U is an m × m unitary matrix; Σ is an m × n diagonal, rectangular matrix of nonnegative real numbers; and V is an n × n unitary matrix. Here, two universal unitary circuits (U , V † ) are connected by a column of single MZIs that are used as variable attenuators implementing Σ.
In the original implementation of the "Miller" network, each MZI is implemented using two internal phase shifters with the differential phase between the two phase shifters being one parameter and the global phase imparted by the two phase shifters as another parameter [13,16,[23][24][25]. The "Miller" MZI configuration can be more compact than the standard configuration, since the overall unit cell length is reduced by the length of one phase shifter.
PNPs have been demonstrated in a number of material platforms, some of which are summarized in Fig. 2. The SOI platform offers an especially high index contrast of 3.4:1.5, which enables low-loss waveguide bends with radii as small as 2 μm [28]. The resulting high component densities are especially important for large PNPs, which already can have up to 88 MZIs connecting 26 optical [1] modes, as shown in Fig. 2(a), and applications are demanding much larger devices. Figure 2(b) shows a silicon photonics-based U 4 PNP that was used for separating a multimode channel into individual single-mode waveguides. The U 6 PNP was realized in germanium-doped glass with thermal modulators, illustrated in Fig. 2(c), and enabled the demonstration of linear optical quantum gates and boson sampling schemes [15]. Figure 2(d) shows a silicon photonics-based U 4 PNP used to demonstrate a universal coupler [26].

Review Article
Phase shifter technology in MZIs is of central importance, and a number of phase shifter technologies are being advanced. Lossless phase shifting mechanisms in silicon include the thermooptic effect (3 dB bandwidth up to a few hundred kilohertz) [29], mechanical effects (∼MHz bandwidth) [30,31], and electricfield-induced electro-optic effects (∼GHz bandwidth) [32]. Recent work [33] has investigated the integration of III-V materials with silicon photonics for compact, low-power phase shifting based on metal-oxide semiconductor capacitors. The possibility of monolithically integrated silicon transistor control circuits [34] and photonic components bolsters the case for large-scale PNPs in silicon. Phase modulation mechanisms that introduce dynamic loss, such as the plasma dispersion effect, are not ideal for realizing PNPs since they complicate the description of the MZI unit cell and give rise to nonunitary transformations. A number of avenues exist to further increase component density. One example is to shrink the directional couplers. Inverse design methods are particularly promising for producing wavelength-scale devices [35,36].

PNP PROGRAMMING
Configuring or programming N × N mode transformations in a PNP involves precise tuning of approximately N 2 phases. This can be a nontrivial problem, especially when considering MZI inhomogeneity and the potential for cross talk between modulators (especially relevant for thermal modulators). MZI phases are set by applying voltages or currents to each phase shifter, labeled here as i, j within the array. Figure 3(c) outlines the basic programming flow. Before considering possible routes towards programming an entire PNP, it is instructive to consider the behavior of a single, programmable MZI. Some single-MZI programming examples are shown in Table 1; here, we assume the differential phase between the two input modes to an MZI can be controlled and is described by some phase γ. Without an external phase shifter (ϕ), transformations are confined to the plane shown in Fig. 3(b). To access the full Poincaré sphere, an external phase shifter is required.
A number of programming protocols have been developed, and they can broadly be grouped into one of three categories: (1) element-by-element, with phase shifter settings for each MZI considered individually; (2) compiled, with phase shifter settings for each MZI resulting from a matrix decomposition algorithm [16,21,22]; or (3) optimized, with phase shifter settings for each MZI resulting from the execution of an optimization protocol acting on the phase shifters [1,16].
PNPs acting as a switching matrix are generally programmed using a category (1) protocol. PNPs implementing matrices or quantum gates [2,15] (which can be specified as unitary matrices) are generally programmed using a category (2) protocol. A matrix is provided as input to a decomposition algorithm, which then returns the phase shifter settings required to realize the matrix transformation. PNPs used as black boxes that unscramble light [26] or scatter light to implement a specific output intensity pattern [1] are programmed using a category (3) protocol where the phase shifter settings are prescribed by an optimization algorithm.
To evaluate the accuracy of a program in a PNP, it is useful to characterize the unitary transformation it implements. Fortunately, efficient techniques exist [37] that use laser light to determine the amplitude elements (jU i,j j 2 ) and interferometry to determine the phase arguments (argU i,j ) up to an unobservable input and output phase screen. Circuit fidelity is a metric that quantifies the "closeness" between two unitary matrices and is given by F C TrjU † U T j 2 , where Tr is the trace operator, U is the measured unitary, and U T is a target unitary.  [27] implemented in the SOI platform.
After fabrication, the initial state of the PNP is unknown due to static phase disorder within the waveguides. This effect has been studied in the context of silicon photonics and is parameterized by the static "phase coherence length" [38]; in silicon, this parameter is typically on the order of a few millimeters. To correct for this initial phase disorder, a PNP can be calibrated. There are several known algorithms for calibration, including selfconfiguring protocols [23] and progressive algorithms [39]. The ability to monitor the power at each MZI in a PNP enables dynamic, local measurements of the state of the system (at the cost of electronic control complexity); contactless integrated photonic probe (CLIPP) detectors avoid excess insertion loss by detecting light via bandgap defect states [26].

APPLICATIONS
We now discuss a subset of recent PNP applications: selfconfiguration and mode mixing, quantum transport and quantum gates, and machine learning.

A. Self-Configuration
As mentioned above, accurate configuration of the many degrees of freedom (phase settings) in the PNP can pose a challenge, especially when accounting for inhomogeneity in constituent devices. In 2013, Miller proposed a self-configuring solution for one particular PNP function: the coherent addition of light from N spatial input modes into one spatial output mode by canceling the fields in the remaining N − 1 output modes [16]. This concept is illustrated in Figs. 4(a) and 4(b), where the phase shifters of MZIs A-D are consecutively tuned to cancel the photocurrents on the corresponding output detectors. An important advantage in this approach is that each MZI can function without global knowledge of the other MZIs or photodetectors, and this independent self-configuration promises that such coherent, nearly lossless mode adders could be very fast. The coherent field adder only works if the optical modes are locally phase stable; for example, it would be impossible to add single-photon excitations (which have no fixed relative phase) over the input modes. Instead, arbitrary linear optical mode converters require an N × N mesh. Using an extension of his previous work, Miller proposed such a self-configuring N × N mesh that uses detectors on each MZI [26]. Using SOI PIC platforms, a 4 × 4 universal PNP with power monitoring taps was demonstrated in 2016 [27] [see Fig. 2(d)]. A 4 × 4 dynamically self-configuring mode adder was demonstrated in 2017 [26]. As shown in Fig. 4(c), the authors used a 980 nm laser to generate a dynamic input state to the 4 × 4 mesh and used CLIPP detectors to actively track and undo mode mixing. As they scale in numbers of modes, self-configuring circuits could enable a range of applications [40], from spatial multiplexing/demultiplexing-for example in multimode fiber communications-to beam tracking and quantum circuits. The "Clements" architecture cannot be selfconfigured in this way, though a scheme has been proposed to allow progressive configuration of such networks [24].

B. Quantum Information Processing
Photons are appealing as a carrier of quantum information due to their ability to propagate with low loss over long distances, phase stability, and their amenability to control even at room temperature in PICs [41]. Perhaps the greatest challenge lies in producing controlled interactions between photonic quantum states: deterministic two-photon gates require many ancillae photons together with measurement and fast active feedforward [9,42,43], or atom-mediated interactions [44,45] translated to PIC-compatible platforms [46][47][48][49]. As both approaches require phase-stable control of large numbers of optical modes with high precision, programmable PNPs are emerging as important By setting θ π∕2 and all other phases to zero, the Hadamard matrix (or 50:50 splitter) is realized.

Review Article
platforms. In contrast to custom-built static PICs, PNPs also provide a platform for rapid prototyping of photonic quantum information processing protocols, including quantum computing protocols [15], quantum transport [1], and quantum simulation [6,50]. In the following, we briefly discuss some of these demonstrations.

Quantum Transport
A number of interesting problems, from coherent effects in biological processes [51] to quantum computing [52,53] and quantum search [54], involve the transport of quantum particles along chains of coupled quantum systems. One experimental approach relies on a photonic quantum walk along discrete lattice sites, which can be represented as the waveguides of the PNP. While nonlinear interactions between photons give rise to particularly rich phenomena and applications, even linear quantum walks of single or multiple photons have a number of applications [1,50,55,56] and have been proposed to be computationally hard on classical computers for large-enough problems [57]. Review Article embeddings of universal unitary circuits up to U 9. An input state of photons enters from the left and undergoes a quantum walk on a 1D chain as it passes in time along the right. By programming the splitting ratios of the sites (via the internal phase shift), it is possible to explore discrete-site quantum transport on a number of graphs. In a recent experiment, Harris et al. [1] explored a single photonic quantum walker under static and dynamic phase disorder. Each of the MZIs were set to implement 50:50 splitting ratios, but the external phase shifters were programmed to have either a static phase variation [illustrated in Fig. 5(c)], a dynamically changing phase [illustrated in Fig. 5(d)], or any combination of static and dynamic phase variations. In this configuration, the PNP implements a balanced coin quantum walk on a discrete-time, 1D graph. A sufficiently large static-only phase variation can confine photons to a local vicinity (as in Anderson localization), whereas a strong dynamic phase variation causes a ballistic diffusion in time (due to dephasing between the sites). An optimal trade-off between static and dynamic disorder (which rises with effective system temperature) had been predicted to facilitate environment-assisted exciton transport in photosynthetic complexes [51]. In this regime, dynamic disorder prevents a particle from becoming "stuck" in one site. The programmability of the PNP made it possible to carefully study this quantum transport across 64,400 unique settings of static and dynamic disorder, and demonstrate this environment-assisted quantum transport experimentally.

Quantum Gates
Universal quantum computers follow two predominant frameworks: the circuit model [58], where single qubit and multiqubit gates are performed sequentially on qubits, and the cluster state model [59,60], where a large entangled resource state is first created, and then single qubit gates are performed, which encode the computation. In linear optics photonic quantum computing, two-qubit processes are realized probabilistically. It is therefore critical that the successful operation of a gate be "heralded" by ancillary photons. Carolan et al. [15] used a six-mode PNP alongside an off-chip multiphoton source to implement a variety of heralded gates in both the circuit and cluster state model. Figures 6(a) and 6(b) show the symbol and photonic circuit for a heralded controlled-NOT (CNOT) operation, which uses two path-encoded computational photons and two ancillary photons. Given a detection in the ancillary modes, the CNOT logic is guaranteed to have taken place on the computational photons [see Fig. 6(c)]. Technologically, the low coupling loss of 0.4 dB between silica waveguides and input/output fibers was key to enabling multiphoton experiments of up to six photons. While SOI PNPs have so far been limited to coupling losses of 3 dB, losses as low as 0.4 dB have been demonstrated in silicon photonics [61], pointing the way towards large-scale SOI PNPs suitable for multiphoton quantum information.

C. Machine Learning
Artificial neural network (ANN) algorithms have dramatically improved natural language processing, image recognition, object detection, and more [62]. ANNs rely heavily on matrix-vector products and require frequent memory access during training and inference. Recent work has focused on developing tailored electronics architecture for ANNs that take advantage of the limited requirements on computational precision, large matrix sparsity, and other features to achieve improved computational rates and energy efficiency [63][64][65][66][67][68]. However, the computational speed and power efficiency achieved with these hardware architectures are still bound by underlying transistor device physics, including switching energies and electronic clock rates-two quantities that are closely linked. Some machine learning algorithms, including neural networks, appear suited for analog computing architectures, including analog complementary metal-oxide semiconductor (CMOS) circuits [69], memristor arrays [70,71], photonic networks [2], and magnetic devices [72]. Photonic methods may simultaneously enable low latency, high energy efficiency, and high throughput [2]. While bulk-optical implementations of optical neural networks (ONNs) have been suggested in the past [73], it has only recently become possible to implement large-scale, phase-stable, and programmable linear transformations. Recent work has focused on implementing hybrid optical-electronic systems that implement spike processing [74] and reservoir computing [75][76][77]. Augmented with optical nonlinearities, PNPs promise high-speed and low-power implementations of neural networks fully in the optical domain.
As shown by Shen and Harris et al. [2], it is possible to directly map the mathematical description of a multilayer perceptron, the most basic form of deep neural network, onto arrays of PNPs connected by nonlinear optical components. In each layer of a multilayer perceptron, a matrix-vector product is evaluated, and then each entry of the resultant output vector is passed through a nonlinear "activation function." A schematic representation of an ONN is shown in Fig. 7(a), and a zoom into a single layer is shown in Fig. 7(b). Matrix-vector products are evaluated using optical interference units in the "Miller" encoding [(PNPs implementing arbitrary, nonunitary matrices as shown in Fig. 1(c)] [16], and activation functions are realized with an optical nonlinearity unit (ONU). Vectors are encoded in the intensity and phase distribution of optical signals incident at the left of the ONN. These optical signals propagate through the set of layers comprising the ONN and are finally converted into electrical current using detectors, shown at the right of  Fig. 7(a). An ONU could be implemented using saturable absorbers [78,79] or devices that exhibit bistability [80][81][82]; both kinds of nonlinear optical devices have been demonstrated in integrated photonic systems, but challenges remain in realizing an array of such nonlinear devices in a single system.

Review Article
Existing neural network training algorithms, such as backpropagation [83,84], executed on electronic computers can be used to determine the set of matrices to be programmed into the ONN. After training, a set of weights in each layer that minimizes an error metric is determined. These weight matrices can be decomposed into PNP phase shifter settings at each layer. After programming, the ONN can be used as an inference machineclassifying vectors that are not part of the training data set.
This adaptation of deep neural networks to integrated photonics was tested on a simple vowel recognition problem [2]. A twolayer neural network with four neurons per layer and a saturable absorber nonlinear activation function was trained on a 64-bit computer against a set of four-dimensional input vectors that represent recordings of people speaking one of four vowels. The data set contained 360 vectors; 180 were used for training and 180 were exclusively used for testing. After training, the ONN was able to correctly classify 138/180 spoken vowels (compared to 165/180 for a 64-bit digital computer). Advances in PNP programming fidelity and improved readout (including optical fiber packaging techniques) may reduce the performance gap between the ONN and the digitally simulated one.

DISCUSSION
PNPs are already finding applications in proof-of-concept demonstrations including classical computing systems [1][2][3], quantum computing systems [15], self-calibrating mode mixers [26], and matrix processors [2,15,27]. For real-world applications, it is still necessary to address some important challenges, including (1) the development of more compact, low-power phase shifters with ultralow loss and-for many applicationsprogrammability at rates of MHz and higher; (2) operation outside the near-infrared spectrum, especially at shorter wavelengths; (3) precise electronic control over tens of thousands of phase shifters; and (4) more compact ultralow loss passive components, which may be developed by computational design [35,85].
While there are many challenges towards scaling PNPs, significant progress is being made on multiple fronts. Optoelectronic systems with over 1000 active elements and the circuits that control them have been monolithically integrated in CMOS processes [86]; MEMS and NEMS switches show promise for lowpower switch arrays [27,31]; and a growing range of materials are becoming available, including SOI, silicon nitride, and InP. These developments point to a new era in photonics design and applications in which high-volume manufacturing will make general purpose PNPs containing an abundance of components cost-effective over custom-designed PICs in many applications. As field programmable gate arrays (FPGAs) have enabled a new paradigm for electronics, PNPs, or "optical FPGAs," will enable unforeseen applications and advances for optical processing.   Fig. 1(b).