Reinforcement Learning for Photonic Component Design

We present a new fab-in-the-loop reinforcement learning algorithm for the design of nano-photonic components that accounts for the imperfections present in nanofabrication processes. As a demonstration of the potential of this technique, we apply it to the design of photonic crystal grating couplers fabricated on an air clad 220 nm silicon on insulator single etch platform. This fab-in-the-loop algorithm improves the insertion loss from 8.8 to 3.24 dB. The widest bandwidth designs produced using our fab-in-the-loop algorithm can cover a 150 nm bandwidth with less than 10.2 dB of loss at their lowest point.


I. INTRODUCTION
The design of photonic components by traditional methods is a complex, time and labor intensive process.This process starts with a theoretical model.Simulations are then run to verify this model and obtain more accurate performance estimates.This process is computationally intensive, and thus, it takes a long time to obtain accurate results.Once simulations are complete and a final design is chosen, the components must then be manually optimized for the fabrication process.The challenge in doing so is that despite efforts to model the effects of lithography, 1-3 the models only consider process bias and smoothing, thus resulting in a mismatch between the simulated geometry and the fabricated one.The complexity of this design process has limited the wide scale adaptation of integrated photonics as it requires domain specific knowledge, limiting the pool of potential component designers.
Machine learning has shown that computers are able to analyze data and come up with valuable insights in many fields.Deep learning is a subset of machine learning where an artificial neural network is used to learn from a dataset and provide these insights.Deep learning has been applied to many fields with great success, such as image classification, [4][5][6] anomaly detection, 7 medical imaging, 8,9 and photonics. 10Reinforcement learning (RL) is a subset of deep learning that uses feedback in the form of a reward or score from an environment to generate new, improved actions.][18] Prior machine learning approaches to component design rely on simulations to obtain their datasets, resulting in the same two disadvantages found in traditional design methods.First, the simulation of photonic components takes an extremely long time to obtain accurate results.Second, simulations, even corrected with lithography models, do not account for all imperfections present in every nanofabrication process.
In this paper, we show that we can successfully overcome the challenge of achieving optimal device performance by inventing a new approach: fab-in-the-loop reinforcement learn-ing.This approach incorporates feedback from measurements of prior fabricated designs to produce new, improved designs using reinforcement learning.We demonstrate that by applying fab-in-the-loop reinforcement learning to photonic component design, we are able to produce components with better performance than we are able to produce using traditional design methods.These best performing designs are largely unintuitive and would be unlikely to be suggested by an expert designer.Without detailed knowledge of the fabrication deviations, it would be fortuitous if human-generated design would outperform our best performing designs.
Our approach is also efficient.To obtain a new dataset using fab-in-the-loop takes approximately one week with our inhouse fabrication and measurement capabilities: Training and running the fab-in-the loop RL algorithm takes a day on an older Mac laptop with an 8-core Intel i9 processor.Fabrication of the new chip takes 1.5 days.Automated measurement takes 3 days.If the data contained on one chip were to be obtained using traditional simulation approaches and available simulation hardware, it would take approximately 2 years using 3D Finite-Difference Time-Domain (FDTD) simulations on a server equipped with dual Intel Xeon Gold 5118 processors and 96 GB of RAM without accounting for fabrication effects.Using RL to generate improved devices is, therefore, only possible with our fab-in-the-loop technique Fig. 1.
To demonstrate the power of our technique, we apply fabin-the-loop RL to grating coupler design.Optical coupling to and from an integrated photonic circuit with low loss and high bandwidth is critical for many applications.Grating couplers are a convenient method to achieve this as they allow for the rapid testing of photonic devices.They do so by surface coupling light into waveguides, which allows for them to be placed anywhere within a design, in contrast to edge coupling.Used in conjunction with automated measurement setups, they allow for the rapid characterization of thousands of individual photonic circuits across a chip.Unfortunately, grating couplers are typically plagued by high insertion loss and/or narrow operating bandwidths.They are also sensitive to fabrication process variations.These issues have traditionally required knowledgeable individuals to design custom

FabricationImperfections
Process Bias Smoothing Side Wall Angle Sidewall Roughness Wafer Thickness Variation FIG. 1.A comparison between traditional device optimization techniques vs the fab-in-the-loop approach.a) In the traditional approach, the optimizer will produce a design based on simulation results.The user will then fabricate this design, measure it, and find the performance drastically different from that predicted by the simulation due to various fabrication effects.b) The user introduces a lithography model to correct for the process bias and smoothing, but this model will not account for other fabrication effects.c) In the fab-in-the-loop approach, the algorithm will automatically optimize the device to the fabrication process without additional user input based solely on the measured results.
][21] Utilizing fab-in-the-loop RL, we obtained a grating coupler design with an insertion loss of 3.24 dB as compared with an insertion loss of 8.8 dB for a design that utilizes the traditional design methodology on our single etch air clad silicon on insulator (SOI) platform.The widest bandwidth designs produced with our technique cover the 150 nm bandwidth of our laser with less than 10.2 dB of loss at their lowest point.With these results, we can choose an optimal grating coupler design for each application.
In section 2, we present the parameterized grating coupler design used as a demonstration of our fab-in-the-loop RL.In section 3, we describe the fab-in-the-loop RL algorithm.In section 4, we discuss the device optimization process.Finally, in section 5, we discuss the results.

II. THE PARAMETERIZED GRATING COUPLER
A parameterized grating coupler design was created as an initial starting point for the fab-in-the-loop RL.This is key to the process as a parameterized design reduces the search space required, resulting in faster convergence of the neural network results and, thus, requiring fewer training rounds.This design is illustrated in Fig. 2. It is parameterized by 12 geometric quantities with ranges given in Table I.This initial design was developed to conform to the limits imposed by our fabrication process as described below.The ranges of the 12 parameters given in Table I have been chosen to have a wide search space while still maintaining a compact device footprint.
Our fabrication process is based on a 525 nm thick positive electron beam resist Zep520A. 22This places some constraints on the design.The maximum aspect ratio of Zep520A is 5:1; this means that traditional sub-wavelength designs 23,24 are not possible with our process due to the resist collapsing when fabricating the sub-100 nm isolated features required for such designs.To overcome this challenge, a design based on sub-wavelength holes instead of sub-wavelength lines is used.The lattice design is based on the design from L. Liu et al. 25 To reduce the device area, focusing has been added to this traditional photonic crystal grating coupler (PhCGC) design.We also include focus-angle variation, and both horizontal and lattice apodization.The purpose of the horizontal apodization is to improve the insertion loss and bandwidth as has been shown by Y. Ding et al. 26 This apodization is defined from its starting point with the hole diameter decreasing until it reaches the minimum hole diameter, at which point it will remain fixed until it reaches its endpoint.The vertical apodization is to improve the focusing of the light into the waveguide.This parameter allows for the hole diameter to be increased or decreased moving out from the center.Two different values of this apodization can be used as specified by its dividing point.This is useful for fine tuning the focus of the fiber spot into the waveguide.Both these types of apodizations are affected by another parameter, the hole diameter at which to switch to lattice apodization.This means that once the hole diameter is at or below this level, the lattice will be expanded instead of the hole diameter being reduced further.This is done as both the spacing of the holes and the hole diameter affect the effective index.However, the minimum hole diameter is affected by the fabrication process.By increasing the lattice constant, one can obtain a similar effect to a smaller hole size without the fabrication limit.The angle is the focus angle of the grating coupler cone.The grating-start defines where, on the grating coupler, the holes start to be drawn and the grating-end defines where the grating coupler ends.Table I enumerates all 12 parameters and their ranges.The range of these parameters has been chosen to have a wide search space while still maintaining a compact device footprint.This parameterized design is then optimized by the fab-in-the-loop RL algorithm.

III. THE REINFORCEMENT LEARNING ALGORITHM
Fabricated devices are needed to provide data to our fab-inthe-loop RL algorithm.An initial design is input as the starting point to our fab-in-the-loop algorithm, which then generates a set of 1250 photonic components, which have a range of design parameters.In our case, our initial input is the best traditionally optimized design for each wavelength bin in wavelength range of 1490-1640 nm based on the available laser.In addition, four different control grating couplers with hole radius of 80, 100, 120, and 140 nm respectively, are added  o the chip for determination of the current process bias.We fabricate a chip with these components, measure their spectra, and generate a dataset consisting of the spectra and design parameters for each one.This dataset is then input back into our fab-in-the-loop RL algorithm, which then generates new, better performing designs.Now the initial input is the best measured design from the previous round for each wavelength bin in wavelength range of 1490-1640 nm.A new chip with the new designs is made, and the process is repeated until an optimal design as needed for the particular application is achieved.
An issue with our approach is that the measured dataset will still be smaller than in traditional applications of RL.To overcome this defect, our fab-in-the-loop RL algorithm consists of both a spectral predictor and a traditional deep deterministic policy gradient algorithm (DDPG). 27,28The spectral predic-tor generates estimated power spectra photonic device design parameters after training on previously measured devices.The DDPG algorithm proposes new designs based on a score determined from the parameters and the spectra provided by the spectral predictor.
The spectral predictor consists of a neural network as described in Fig. 3  The spectral predictor is trained from measured data.First, a chip is measured after a fabrication round.The current process bias is determined from the optical spectrum of the set of four different control grating couplers with hole radius of 80 nm, 100 nm, 120 nm and 140 nm respectively.This is done by calculating the correlation function of the optical spectrum of the design with the same hole radius with varying bias steps and the spectrum for the same design with no bias from the initial dataset.The bias which results in the maximum correlation with the original unbiased result is then determined to be the current process bias for that hole radius.This allows for any changes in the fabrication process to be accounted for.
The spectral data from the chip is then processed to remove any invalid measurements due to detector saturation, incorrect number of data points or other setup related errors during the measurement process.Then the data for any devices with a minimum insertion loss of greater than 40 dB is dropped.The reason for this is twofold: It is possible that the device was damaged during the fabrication or measurement process and could still be potentially a good design.It would negatively effect the algorithm if these designs were given falsely low scores due to a flawed measurement.The second reason is that the initial rounds are likely to fabricate large numbers of poorly performing designs.Inclusion of a large number of poorly performing designs can cause the spectral predictor to produce poor quality predictions for all designs.
A mean squared error loss function is utilized to train the spectral predictor with a learning rate of 0.0001 on 10 000 examples drawn at random from the above dataset. 29ow that the spectral predictor is trained, the DDPG algorithm generates a new set of designs.The DDPG portion of the algorithm consists of identically dimensioned actor-critic networks with two layers of size 600 and 400, respectively.The input is 14 parameters, consisting of the 12 parameters of the grating coupler design and two parameters specifying the desired operating wavelength bin.The output is 12 improved grating coupler parameters.The DDPG proposes a new set of parameters.The spectral predictor generates a spectrum.
A score is then computed, which is provided to the DDPG.The score is determined, using the principle of reward shaping.
If all 12 device parameters are in range, then the score is calculated using the estimated power spectrum found by the spectral predictor.The average power is calculated over the target wavelength range and then this power is normalized.The score S is where np is the normalized power, r is the hole radius, and a is the lattice constant.The score penalizes designs in which the hole radius r is greater than the lattice constant a as they generally perform poorly.
If one or more of the device parameters p i is out of range, first, a parameter score is calculated for each out of range parameter p i : where max(p i ) and min(p i ) are the maximum and minimum allowed values for that parameter.The score is then given by S = 10 where the sum is over the set of out of range parameters.Again, this score penalizes out of parameter range designs.The score calculated in Eqs.(1 -3) is used by DDPG 27,28 to generate new designs as seen in Fig. 4b.
For training, the variables describing the training process are α = 0.000005, β = 0.0005 and τ = 0.0001, and a batch size of 32 was used. 2710000 training episodes were used for each fabrication run.A training episode is one round of the DDPG algorithm for each of the requested wavelength ranges.The length of the training episodes, that is, the number of attempts at improving the design, is determined in the following way: If all parameters of the device are in range, a length of 10 is used.Otherwise, after 3 out of range actions, the episode is ended early.New designs with all parameters in range are saved.The top 1250 designs are fabricated in the next cycle.
After fabrication, the new devices are measured, and a new dataset is generated.The new data are then used to train the spectral predictor and update the current best devices used to generate improved designs.

IV. OPTIMIZATION
The fab-in-the-loop cycle is started.10 000 episodes of the DDPG algorithm are run for 10 requested wavelengths bins A DDPG episode is started with the current best design.Then, the DDPG algorithm produces a new set of design parameters.These parameters are then scored using the scoring algorithm described in Eqs.1-3.This score is combined with the parameters and fed back into the DDPG algorithm until the end of the training episode.This is repeated 10 000 times for each of the required wavelength ranges.between 1490 and 1640 nm.These wavelength bins are kept constant after being selected so the fab-in-the-loop RL algorithm returns improved designs for each wavelength.A new chip is then fabricated.This chip contains 1250 examples selected from the best scoring designs from the algorithm training run.For calibration and training purposes, the current top five fabricated designs in each wavelength bin, along with four different random biases between -20 and +20 nm of each of these designs, are added to the chip.This is done to provide the spectral predictor with data on how process bias affects high scoring designs.In addition, the four different control grating couplers with hole radius of 80, 100, 120, and 140 nm respectively are again added to the chip for determination of the current process bias.The fabricated devices are measured and the dataset is updated.This cycle was repeated six times.
An image of a small area of two chips for the first and last round is displayed in Fig. 6, which visually exhibits the convergence of the parameters.

V. RESULTS
Six rounds of the fab-in-the-loop RL algorithm were carried out and produced designs that are significantly better than those from traditional design methodology.The spectral predictor is key to producing these designs.
The importance of the spectral predictor can clearly be seen in Fig. 7.A programming error meant that the spectral predictor was not fully utilized to select new designs.This error was corrected for the sixth fabrication run.The result of this can be seen in Fig. 7.The spectral predictor successfully eliminates poor designs before fabrication, resulting in a decrease in the mean insertion loss from 14.8 dB on chip 5 -8.6 dB on chip 6.The mean insertion loss on chip 6 is an improvement over the insertion loss of our initial, traditional design with 8.8 dB of loss.Chip 6 has 1224 out of 1250 designs, which is 98% of designs, with less than 20 dB of insertion loss.Chip 5 only has 691 out of 1250, which is 55% of designs, in the same category.Chip 6 has 766 devices with less than 8.8 dB of insertion loss; chip 5 has only 58.The spectral predictor significantly refined the search space around designs that are likely to perform well as highlighted by these results.This significantly improves the chances of finding low loss devices performing well above the mean.
The predicted spectrum from the spectral predictor also improved significantly with each fabrication run.Fig. 8 shows the spectra from one given device as generated by the spectral predictor as trained in rounds 1-6.One sees that the spectrum converges to the measured one.In Fig. 9, we give a box plot of the predicted spectrum from each training round to the measured spectrum from round 6 for the entire set of round 6 devices.Again it can be seen that the error significantly improves after only 3 fabrication runs.FIG. 7. A histogram highlighting the importance of the spectral predictor.Chip 5 is in blue, chip 6 is in red and the overlap between the two is checkered.Round 6 fully utilizes the spectral predictor.With the spectral predictor, the mean has significantly improved from a mean insertion loss of 14.8 dB on chip 5 to a mean insertion loss of 8.6 dB on chip 6. Chip 6 has 766 devices or 61% of total number of devices with an insertion loss less than 8.8 dB.Chip 5 has only 58 such devices or 4.6% of the total.It produced devices optimized for operation at different wavelengths as can be seen in Fig. 10.
The fab-in-the-loop RL algorithm also produced designs with an improved insertion loss of 3.24 dB per coupler as compared to an insertion loss of 8.8 dB for our traditionally optimized design. 22An example of such a design is given in Fig. 11.This design involves the creation of a combination of lines formed from merged holes and of separated holes.This is interesting as it overcomes the previously mentioned limi- FIG. 9.The convergence of the spectral predictor over multiple training rounds.This convergence is calculated using the measurement results from the last fabrication round.Then, the parameters of these devices are fed into the spectral predictor that was saved at various stages of training.The resulting predicted spectrum and measured spectrum are then differenced, and the average error is calculated.One can see that after three training rounds, the error is significantly reduced.The spectrum of a traditionally optimized design used as our starting point for the fab-in-the-loop RL algorithm is given by the dashed line.The result of an FDTD simulation of same traditional design is given by the green dotted line.Note that the operating wavelength of the fabricated device is significantly shifted due to fabrication effects.The fab-in-the-loop RL algorithm is able to produce a design at the target wavelength of 1580 nm with an insertion loss of 3.24 dB per coupler.b) The same devices plotted in linear scale.This further highlights the importance of the improvement.With our traditional design, only 14% of photons make it into or out of the chip.With the optimized design, over 45% do so.This improvement is critical for quantum applications low photon loss.
tation of the original sub-wavelength design from our fabrication process.This design would be unlikely for a human designer to attempt due to the assumption that the merged holes would not work.In addition, it would be difficult to draw in a fashion that could be successfully fabricated.In contrast, the fab-in-the-loop RL algorithm successfully produces several such designs.It also produced designs with significantly larger bandwidth than our traditional design.The power spectrum of one such design is given in Fig. 12.This design covers the 150 nm bandwidth of our laser with less than 10.2 dB of loss at its lowest point.Such a design is valuable in the study of other broadband components such as filters.

VI. GENERAL APPLICABILITY CONSIDERATIONS
Fab-in-the-loop RL can be applied to other components, such as contra-directional couplers, 30 splitters, 31 and cavities. 32,33Fab-in-the-loop RL could also be applied to the design of other optical devices, such as lasers, 34 and quantum dots 35 in III-V substrates along with other similar platforms.To do this, one needs a parameterized design and to modify the spectral predictor to provide a prediction of the desired property for the relevant device such as the quality factor, splitting ratio, or filter bandwidth.
For example, fab-in-the-loop RL could be used to optimize a H0 photonic crystal cavity 36 .
The first step would be to create a parameterized cell for the cavity.For an H0 cavity, among these would be the radius, lattice constant, the waveguide coupling distance, shifts in the x direction, and shifts in the y direction necessary to form the cavity.The next step would be to determine the de- sired parameters to optimize, including additional ones important in a fabrication process.For example, round holes end up slightly oblong after fabrication.Introducing additional parameters to describe the distortion of the holes would allow this to be taken into account.The range of these parameters would be set with consideration to ensure that the device performs as an H0 cavity.
Next, the predictor neural network needs to be implemented.For a cavity, it would be more efficient to predict the central wavelength and quality factor instead of predicting the entire spectrum.This could be done by training the neural network on these metrics.In the case of a photonic crystal, it is typical to experience 10 nm of variation in the central wavelength for the same design on the same chip due to fabrication variations.On the surface, this would be a challenge for fab-in-the-loop RL.To account for this variation, the central wavelength predictor neural network could be trained using 10 nm bin for the central wavelength.A set of current process biases would be also input into the spectral predictor to allow it to account for this information.The fab-in-the-loop RL process could then be initialized with an H0 photonic crystal design and several rounds run to produce optimized designs.

VII. CONCLUSION
The fab-in-the-loop RL process allows for the optimization of nanophotonic components accounting the imperfections present in nano-fabrication process and is able to produce devices with better performance for a given fabrication process.In particular, when applied to grating coupler de-sign, fab-in-the loop produced grating couplers with a measured insertion loss of 3.24 dB per coupler as compared with a measured insertion loss of 8.8 dB for our traditionally optimized design.It also produces designs optimized to different wavelengths and bandwidths.The widest bandwidth designs produced using our fab-in-the-loop algorithm can cover a 150 nm bandwidth with less than 10.2 dB of loss at their lowest point.As our process utilizes data obtained from devices fabricated with our nano-fabrication process, they are virtually guaranteed to out-perform conventional brute force, or even fab-uninformed design methods, including other machine learning based techniques.By its nature, fab-in-the-loop RL process learns from information specific to the fabrication process that fundamentally cannot be fully characterized by other means.We believe that it can be applied to other photonic devices to the same effect.

Source
code is available on GitHub at https://github.com/Donald-Witt/Fab-in-the-loop.Other data supporting the findings is available from the corresponding author upon reasonable request.

FIG. 2 .
FIG. 2.A schematic of the parameterized grating coupler.The start of the grating, end of the grating, angle, hole radius and lattice constant are all adjustable.The horizontal apodization start and end are adjustable.The vertical apodization adjusts the hole radius, as you move out from the center line.The vertical apodization dividing point allows for two different values of this parameter.Not shown here is the hole diameter at which the lattice constant is adjusted instead of the hole radius.In total, there are 12 adjustable parameters.

FIG. 4 .
FIG.4.A schematic of our fab-in-the-loop RL algorithm.a) The feedback from measurement data.The measurement data are used to both train the spectral predictor and update the current best design.b) The DDPG portion of the algorithm that produces new designs.A DDPG episode is started with the current best design.Then, the DDPG algorithm produces a new set of design parameters.These parameters are then scored using the scoring algorithm described in Eqs.1-3.This score is combined with the parameters and fed back into the DDPG algorithm until the end of the training episode.This is repeated 10 000 times for each of the required wavelength ranges.

FIG. 5 .
FIG.5.An example of the output from the spectral predictor in red compared with a measured spectrum in blue.It can be seen that they match well in this case.

FIG. 6
FIG.6.a) An optical microscope image of an area of the chip from the first fabrication run.One can see there are a wide variety of designs with varying parameters.b) An optical microscope image of an area of the chip from the last fabrication run.One can see that the designs have converged around the optimal ones leading to a more constrained parameter variation.

6 FIG. 8 .
FIG.8.The convergence of the spectral predictor over multiple training rounds for one device.It can be seen that the results are significantly improved after only three training rounds.

FIG. 10
FIG.10.a) A set of PhCGC spectra designed using our algorithm.The spectrum of a traditionally optimized design used as our starting point for the fab-in-the-loop RL algorithm is given by the dashed line.The result of an FDTD simulation of same traditional design is given by the green dotted line.Note that the operating wavelength of the fabricated device is significantly shifted due to fabrication effects.The fab-in-the-loop RL algorithm is able to produce a design at the target wavelength of 1580 nm with an insertion loss of 3.24 dB per coupler.b) The same devices plotted in linear scale.This further highlights the importance of the improvement.With our traditional design, only 14% of photons make it into or out of the chip.With the optimized design, over 45% do so.This improvement is critical for quantum applications low photon loss.

FIG. 11
FIG. 11. a) An AFM image of a traditional PhCGC.b) A top performing optimized design.This design has an insertion loss of 3.82 dB at 1625 nm.One can see that the design includes corrugations formed by the merging of the holes.The corrugations disappear as the hole size is reduced by the applied apodization.
FIG. 12. a) A wide bandwidth PhCGC spectrum designed using our algorithm.The spectrum of a traditionally optimized design used as our starting point for the fab-in-the-loop RL algorithm is given by the dashed line.b) The same devices plotted in linear scale.
VIII.ACKNOWLEDGMENTSThis work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the B.C. Knowledge Development Fund (BCKDF), the Canada Foundation for Innovation (CFI) and the SiEPICfab consortium.We would also like to thank Dr. Kashif Masud Awan for his initial work on the fabrication process development, Professor Edmond Cretu for the AFM access, and the staff of the Stewart Blusson Quantum Matter Institute's Advanced Nanofabrication Facility for their assistance.Finally, we would like to thank Professor Kristin Schleich for her encouragement to pursue the project and her helpful advice.IX.AUTHOR CONTRIBUTIONS Donald Witt: Conceptualization; Methodology; Investigation; Software; Validation; Data Curation; Writing -Original Draft; Writing -Review & Editing (Lead); Jeff Young: Writing -Review & Editing (Support); Resources (Equal); Supervision (Equal); Funding acquisition (Support); Lukas Chrostowski: Writing -Review & Editing (Support); Supervision (Equal); Resources (Equal); Funding acquisition (Lead).

TABLE I .
PhCGC Parameters and Ranges that estimates the device spectrum based on the input parameters.The spectral predictor.This network takes the 12 parameters of the grating coupler design and the current process bias for 80, 100, 120, and 140 nm holes and produces a power estimate for one wavelength.One hundred and fifty copies of this network are used to produce a power spectrum consisting of 150 wavelength values between 1490 and 1640 nm.