Designing Silicon Photonic Devices using Artificial Neural Networks

We develop and experimentally validate a novel neural network design framework for silicon photonics devices that is both practical and intuitive. The framework is applicable to nearly all known integrated photonics devices, but as case studies we consider simple waveguides and chirped Bragg Gratings. By using artificial neural networks, we decrease the computational cost relative to traditional design methodologies by more than 4 orders of magnitude. We also demonstrate the abstraction of the device models to a few simple input and output parameters relevant to designers. We then apply the results to various design problems and experimentally compare fabricated devices to the neural network's predictions.


Introduction
Silicon photonics has become a viable technology for integrating a large number of optical components in a chip-scale format. Driven primarily by telecommunications applications, a growing number of CMOS fabrication facilities dedicated to silicon photonics are now in operation and available for researchers and engineers to submit photonic integrated circuit (PIC) designs to be fabricated [1].
Designing the silicon photonics components and circuits, however, remains a major bottleneck. Current design flows are complicated by computational tractability and the need for designers with extensive experience [2]. Unlike their electronic counterparts, photonic integrated circuits require computationally expensive simulation routines to accurately predict their optical response functions. The typical time to design integrated photonic devices now often exceeds the time to manufacture and test them.
To address this challenge, we propose and experimentally validate a new design paradigm for silicon photonics that leverages artificial neural networks (ANN) in an intuitive way and is at least four orders of magnitude faster than traditional simulation methods. Our design paradigm only requires a small number of input and output neurons corresponding to descriptive parameters relevant to the designer.
This new approach provides benefits such as rapid prototyping, inverse design, and efficient optimization, but requires a more sophisticated ANN than previous approaches. We demonstrate practical confidence in our method's accuracy by fabricating and measuring devices that experimentally validate the ANN's predictions. As illustrative examples, we design, fabricate, and test chirped integrated Bragg gratings. Their large parameter spaces and nonlinear responses are typical for devices that are computationally prohibitive using other techniques. The experimental results show remarkable agreement with the ANN's predictions, and to the authors' knowledge represent the first experimental validation of photonic devices designed using ANN's.
This work builds on recent theoretical results showing that it is possible to model nanophotonic structures using ANNs. For example, Ferreira et al. [3] and Tahersima et al. [4] demonstrated that ANNs could assist with the numerical optimization of waveguide couplers and integrated photonic splitters respectively. In both cases the input parameter space was the entire 2D array of grid points, showing the power of ANNs in blind "black box" approach, though limiting the designer's ability to intuitively adjust input parameters. A related approach has also been applied to periodic photonic structures, which are often difficult to efficiently model [5,6].
Two recent theoretical papers by Zhang et al [7], and Peurifoy et al. [8] go beyond simply optimizing over a large parameter space, and used ANNs to calculate complicated spectra using a smaller number of intuitive, smoothly varying input parameters. Even though in both cases each wavelength point in the calculated spectra required its own ANN output neuron, the usefulness of using ANNs to model systems with intuitive input parameters was demonstrated.
To illustrate the power of our new design paradigm, we demonstrate both forward and inverse design tools that use a chirped Bragg grating ANN as a computational backend. The forward design tool is interactive, and was used to design our fabricated circuits. The inverse design tool quickly constructs a temporal pulse compressing chirped Bragg grating within specified design constraints -a task typically too computationally expensive for traditional methods.

Results
Overview To motivate our approach, we first describe a neural network that models the effective index of a silicon photonic strip waveguide with various widths and thicknesses. While waveguide simulation is already straightforward from a designer's perspective, the model illustrates the advantages of our approach and is a key building block for more advanced ANN models described below which are less straightforward using existing techniques. These advantages include the ANN's computational speedup of over 4 orders of magnitude, and the simplification and speedup of other complicated simulation routines that rely on effective index calculations. A more advanced example that is computationally intractable via traditional methods is then given, in which we demonstrate an ANN that models the complex relationship between a chirped silicon photonic Bragg grating's design parameters and its corresponding spectral response. Many designers leverage silicon photonic chirped Bragg gratings to equalize optical amplifier gain [9], compensate for semiconductor laser dispersion [10,11], and enable nonlinear temporal pulse compression [12,13]. Figure 1 illustrates the new design methodology. First, we iterated between generating an appropriate dataset and training the ANN until the model adequately characterized the device. Next, we used the ANN to simulate circuits and solve inverse design problems. Finally, we fabricated devices to validate the results.

Waveguide Neural Network
We first report on a simple waveguide neural network capable of estimating the effective index of any arbitrary silicon photonic waveguide geometry for a variety of modes. Specifically, we modeled the relationship between the waveguide's width, thickness, and operating wavelength and the effective index for the first two TE and TM modes. We note that including wavelength as an input parameter is a unique and enabling strategy not previously adopted (see Discussion section below). Figure 2 (f) compares the ANN's predicted effective index to a its corresponding simulation. The first TE and TM mode for any silicon photonic waveguide with a width between 350 nm and 1000 nm and a thickness between 150 nm and 350 nm are demonstrated. The network estimates a smooth response for both modes simultaneously, even for data points outside of its training set. The ANN's smooth output also produces smooth analytic derivatives, which are essential for calculating group index profiles and for gradient-based optimization routines.
We implemented various tests to validate the network's accuracy. First, we split the initial dataset into a training set and a validation set. While the network evaluated both sets after each epoch (i.e. training iteration) only the training set's results were used to update the network's weights. We monitored the validation set's results to assess overfitting. To better understand the network's performance after each iteration, we recorded each epoch's mean-square-error (MSE) and coefficient of determination (R 2 ). Figure  2   process for a strip waveguide and a chirped grating respectively. Often, the designer iterates between these two steps until an appropriate model is developed. Once the model is ready, several design applications, like circuit simulations and inverse design solutions, are available. The designs are then fabricated to validate the model's results. From here, the model can be shared and extended. the relative error for both the training and validation sets after the final epoch. Both the training set errors and validation set errors are similarly distributed and tightly bounded between −1% and 1%, once again indicating little to no overfitting.
With confidence in the waveguide neural network's prediction accuracy, we benchmarked its speed and found that a single neural network evaluation was 10 4 times faster on average than the corresponding finite difference eigenmode simulation. Figure 2 (d) compares the computation speed for the ANN to the eigenmode solver, Meep Photonic Bands (MPB). This significant speedup enables many simulation techniques, like the layered dielectric media transfer matrix method (LDMTMM) [14] or the eigenmode expansion method (EMM) [15], where photonic components are discretized into individual waveguides. Using the ANN, a transfer matrix for each waveguide can be quickly generated and cascaded to formulate a fairly accurate response for the device. In addition, modeling fabrication variations is now much quicker since existing Monte Carlo sampling routines can leverage the ANN's speed.

Bragg Grating Neural Network
Modeling the relationship between a Bragg grating's physical design parameters and its corresponding responses is difficult since no one-to-one mapping exists. Consequently, many designers resort to black-box optimization routines that strategically search the parameter space for viable design options. As a result, inverse design problems -where a simulation needs to run each iteration -become intractable for even modest size gratings. If a full 3D FDTD simulation is performed, for example, each optimization iteration can take between 8-12 hours on typical desktop computing systems. In addition, the optimization routines tend to inefficiently simulate redundant test scenarios for different design problems. We train and demonstrate a Bragg ANN, however, that can predict a grating's response on the order of milliseconds on the same system, enabling much faster solutions to more complex design problems. We fabricate various test devices and validate our neural network's predictions.
Using the waveguide neural network, we generated a dataset to train our Bragg grating neural network to predict the reflection spectrum and group delay response of a silicon photonic, sidewall-corrugated, linearly chirped Bragg grating, as illustrated in Figure 3 (d). We note that generating the training dataset was approximately 2 orders of magnitude faster using the waveguide ANN reported above rather than traditional methods. To smooth apodization dependent ringing, we pre-processed the training data. More information regarding this step is provided in the Supplementary Material. We parameterized the gratings by length of the first grating period (a 0 ), length of the last grating period (a 1 ), number of grating periods N G, and grating corrugation width difference( ∆w = w 1 − w 0 ). We designed the network to receive these four parameters along with a single wavelength point as inputs. The network has two outputs: reflected optical power and group delay.
Similar to the waveguide network, we divided the dataset into a training set and validation set. We tracked both the MSE and the R 2 metrics after each epoch. The Bragg training set was much larger than the waveguide training set, owing to the larger parameter space. Consequently, the MSE converged within the first few epochs and we stopped training after just five epochs to prevent overfitting. The final MSE for the training and validaton sets was 1.845 × 10 −4 and 1.677 × 10 −4 respectively. The final R 2 was 0.9975 and 0.9977 respectively. Once again, the MSE and R 2 evolution for both the training set and validation set converge well, indicating little to no overfitting. Figure 3 (a-b) illustrates the network's MSE and R 2 evolution. Figure 3 (c) illustrates the absolute error for both the training sets and validation sets. We calculated the absolute error because several training samples were at or near zero and skewed the relative error.
We note that calculating Bragg grating response with the ANN is much more computationally efficient than previously demonstrated methods. This is because the Bragg ANN linearly increases in computation complexity with added grating parameters, while LDMTMM and all other methods known to these authors increase at least quadratically.
To validate the Bragg ANN, we fabricated and measured several silicon photonic Bragg gratings with different chirping patterns and compared their transmission, reflection, and group delay spectra to the neural network's predictions. The gratings were arranged in one of two configurations: (1) a simple circuit that only measured the Bragg grating's transmission spectra and (2) a more complicated interrogation circuit capable of measuring the reflection, transmission, and group delay profiles from the same device simultaneously.   Figure 3 illustrates the interrogation circuit used to measure all three responses simultaneously. In both configurations, grating couplers were used to route light on and off the chip. While the simpler circuit required less de-embedding, the full interrogator circuit allowed for a more comprehensive device characterization.
The transmission-only gratings were designed with various grating period bandwidths from 5 nm to 20 nm, each with 600 periods and a corrugation width of 50 nm. The initial design parameters produced ANN predictions that match the measured data extremely well. Small discrepancies in the grating responses are largely attributed to the grating's apodization profile and detector noise. Figure 4 illustrates the comparison between the ANN's predictions and the measured data.
We designed the remaining gratings using a much smaller chirp bandwidth of 3 nm with 750 grating periods and a 30 nm corrugation width. We mirrored the orientation of half the gratings in order to measure both positive and negative sloped group delay profiles. Once measured, we normalized the data by de-embedding the responses from the various Y-branches, directional couplers, and grating couplers that complicate the measurement data. The process is explained in the Supplementary Material. Even with the rather complex transfer function, the transmission, reflection, and group delay profiles match the ANN's corresponding predictions well except for occasional resonant features caused by fabrication defects. These defects are expected since the narrow bandwidth devices have a grating pitch with a fine discretization that approaches the e-beam raster grid resolution. Small changes in grating pitch that don't align with the raster grid occasionally produce weak Fabry-Perot resonance conditions visible in the data. These rasterinduced defects also account for a small lateral shift (˜1 nm) in the responses. Even with these fabrication challenges, it is notable that the ANN successfuly predicts the transmission, reflection, and group delay profiles simultaneously. In fact, the ability to do so in noisy fabrication environments is one of the key advantages of the ANN and may allow for efficient parameter extraction where other methods fail.

Forward design
The neural network's speed and flexibility enable forward design exploration. For example, Figure 5 illustrates a graphical user interface (GUI) built with slider bars to adjust the Bragg grating's design parameters (i.e. corrugation widths, grating length, chirp pattern, etc). The plots dynamically update, calling the neural network every time the user modifies the input, and display the corresponding reflection and group delay profiles. Because wavelength is included as an input to the ANN rather than an output, arbitrary wavelength sampling within the domain is allowed. Computing these responses in real time is not possible using traditional techniques. This capability is valuable and allows even novice designers the ability to rapidly gain device intuition without necessarily understanding the underlying numerical techniques.

Inverse design
This new approach also enables an entirely new set of inverse design problems. For example, we used the neural network in conjunction with a truncated Newtonian optimization algorithm to design a temporal pulse compressor. Designers often rely on dispersive Bragg gratings to generate short, optical pulses for high-capacity communications [12]. In this particular case study, we assumed an arbitrary source generates a 20 ps wide chirped pulse with 4 nm of bandwidth. Figure 6 illustrates the optimization routine's evolution, the resulting grating response, and the pulse shape before and after the Bragg grating. Such optimization algorithms run much quicker than previously known methods, owing to the accelerated cost function. The agnostic nature of the neural network interface works well with typical optimization routines, especially since any arbitrary wavelength sampling is allowed. Depending on the cost function formulation, gradient-based methods could directly evaluate the Jacobian and Hessian tensors from the ANN without any extra sampling or discretization.

Discussion
Our method demonstrates a new, viable platform for silicon photonic circuit design. With a single global parameter fit, we successfully modeled silicon photonic waveguides and silicon photonic chirped Bragg gratings with arbitrary bandwidths, chirping patterns, lengths, and corrugation widths and allowed arbitrary wavelength sampling. Future work could explore new network architectures (e.g. different activation functions, Figure 4: Fabrication data compared to corresponding ANN predictions. (a1-a4) Measured transmission responses for gratings with a period chirp of 5 nm (a1), 10 nm (a2), 15 nm (a3), and 20 nm (a4). (b1-b2) Transmission and reflection responses for two different Bragg gratings. Both gratings share the same design parameters, and have an identical but opposite linear chirp. The result of the mirrored chirping is seen in both the normalized MZI interference patterns (c1 , c2) and the extracted group delay responses (d1 , c2).  (a1 and a2). Any time the user adjusts these parameters, the program calls the ANN and reproduces the expected reflection and group delay profiles for that particular grating. Due to the ANN's speed, the program is extremely responsive. Figure 6: ANN-assisted design of a monolithic temporal pulse compressor using a silicon photonic chirped Bragg grating. A truncated Newton algorithm was tasked with constructing a grating that compressed an arbitrary chirped pulse by a factor of 2. After 340 grating simulations, the optimizer sufficiently minimized a cost function (right) that compared the new pulse's width to the old pulse. The resulting grating is demonstrated below and the input, output, and desired pulses for iterations 1, 140, and 340 are demonstrated on the left. layer connections, etc.) and training algorithms. Several other devices, like ring resonators [16], can also be modeled. One could subsequently cascade several ANNs that model the scattering parameters of different devices, opening the door to large-scale optimization problems.
An important feature of this work is the choice to model the wavelength as a continuous input parameter rather than fix each output neuron at a specific wavelength point as done in all previous work known to the authors. The waveguide ANN, for example, outputs effective index values and the Bragg ANN outputs reflection and group delay values across the entire input spectrum. This approach, while more difficult to train, is more convenient for the designer. For example, an optimization routine tasked with designing a Bragg filter can focus more on parameters like the bandwidth and shape, rather than an arbitrarily sampled wavelength profile. Furthermore, this method doesn't require the training spectra to have the same sampling. Training sets for structures like ring resonators, whose features may require finer wavelength resolution than other devices, can now be strategically simulated to highlight these features. Assuming the network is trained correctly across a suitable domain, the ANN will seamlessly interpolate between both design parameters and wavelength points without any additional routines.
Unlike traditional simulation methods, training an arbitrary device ANN requires large datasets that are too time-intensive for most typical computers. With the growing availability of vast cloud-based computational resources, however, several million training simulations can now be run in hours or days. [17]. Once trained, a neural network can reliably interpolate between training data, is compact and easily shared with the community, and can even continue to learn on new datasets via transfer learning [18]. Thus, the computational complexity inherent in designing integrated photonic devices can be moved to the front end of the design process, allowing individual designers to work with abstracted components whose optical response can be rapidly calculated.
As with all deep learning applications, the network's utility is limited by biases introduced in the training set, the network architecture, or even the training process itself [19]. Fortunately, we can anticipate these biases by extracting the model's prediction uncertainty without modifying our network architecture. Dropout inference techniques leverage models that rely on dropout layers to mitigate over-fitting (a form of network bias) [20]. Even pre-trained networks can use dropout inference to extract prediction uncertainties without any modifications to the network. This particular network design methodology opens the door to many more applications, like training on fabricated device data. Foundry's that develop process design kits (PDKs), for example, can use this technique to model their fabrication processes while preserving their trade secrets.

Training data generation and preprocessing
We generated our waveguide neural network's training set on a high performance computing cluster (HPC) using MEEP Photonic Bands (MPB) [21], a finite difference eigenmode solver. The solver simulated 31 different waveguide widths from 350 nm to 1500 nm and 31 waveguide thicknesses ranging from 150 nm to 400 nm resulting in 961 different geometries. The solver simulated 200 distinct wavelength points in the range of 1400 nm to 1700 nm. The total number of training samples fed into the neural network was 192,200. 70% of the data set was used as training samples and the remaining 30 % was used as validation samples. Each sample had three inputs (width, thickness, and wavelength) and four corresponding outputs (effective indices for the first two TE and TM modes). No postprocessing was performed on the waveguide training data.
On the same HPC, we generated our Bragg training set by simulating 104,131 different gratings with the layered dielectric media transfer matrix method (LDMTMM) [14]. The LDMTMM method models each individual section of the Bragg grating as an ideal waveguide and cascades each sections's corresponding transfer matrix to estimate the grating's response for each wavelength point of interest. We calculated each individual waveguide's effective index using the waveguide neural network. Our simulations swept through 10 different corrugation widths from 10 nm to 100 nm, 11 different grating lengths from 100 periods to 2000 periods, and 32 different chirping patterns.
Once the grating spectra were generated, we fit the results to a generalized skewed Gaussian (see Supplementary Material) to reduce ringing and to generalize the grating's response to arbitrary apodization profiles. We found that without fitting, the resulting oscillations significantly complicate the training process and restrict the network's domain to a single apodization. We fit both the reflection spectrum and group delay responses to generalized Gaussians and resampled the results with 250 wavelength points from 1.45 µm to 1.65 µm. Since the nonlinear fitting routine occasionally failed, not all of the simulated gratings were appropriate for testing. After filtering through the results, we generated a database of 26,032,750 training samples.

Neural network design and training
Both neural networks were trained on the same HPC cluster using Keras [22] and Tensorflow [23]. Several hundred different architectures were tested. To gauge the effectiveness of each architecture, the meansquared-error and coefficient of determination (R 2 ) metrics were used. The waveguide neural network that worked best had 4 hidden layers with 128 neurons, 64, neurons, 32 neurons, and 16 neurons. Each neuron used a hyperbolic tangent activation function. The Bragg grating neural network was designed with 10 hidden layers and 128 neurons each. RELU activation functions were used. Both networks were trained with 16 sample batch sizes. While the waveguide neural network was trained with 100 epochs, the Bragg grating neural network only needed about 5 epochs to reach sufficient results, primarily due to the large training set. The Bragg training set was normalized to improve the network's expressive capabilities.

Simulation benchmarks
We performed all benchmarks using a quad-core Intel(R) i5-2400 CPU clocked at 3.10 GHz with 12 GB of RAM. To evaluate the waveguide ANN's speed, we simulated various waveguide parameters in serial using both the ANN and MPB. To evaluate the BG ANN's speed, we simulated various grating's in serial using both the ANN and the LDMTMM. We linearly fit each method's results and compared the slopes to examine the speedup.

Device fabrication
The silicon photonic Bragg gratings were fabricated by Applied Nanotools Inc (Edmonton, Canada) using a direct-write 100 keV electron beam lithographic process. Silicon-on-Insulator wafers with 220 nm device thickness and 2 µm thick insulator layer were used. The devices were patterned with a raster step of 5 µm and etched with a ICP-RIE etch process. A 2.2 µm oxide cladding was deposited using a plasma-enhanced chemical vapour deposition (PECVD) process.

Device measurement
Each device was measured using an automated process at the University of British Colombia (UBC). An Agilent 81600B tunable laser was used as the input source and Agilent 81635A optical power sensors as the output detectors. The wavelength was swept from 1500 to 1600 nm in 10 pm steps. A polarization maintaining (PM) fibre was used to maintain the polarization state of the light, to couple the TE polarization in and out of the grating couplers. Several dembedded test structures were used to normalize out the coupler profiles.

Data Availability
The data that support the plots within this paper and other findings of this study are available from the corresponding authors upon reasonable request.