Accurate and efficient prediction of photonic crystal waveguide bandstructures using neural networks

: We demonstrate the use of neural networks to predict the optical properties of photonic crystal waveguides (PhCWs) with high accuracy and significantly faster computation times compared to traditional simulation methods. Using 100,000 PhCW designs and their simulated bandstructures, we trained a neural network to achieve a test set relative error of 0.103% in predicting gap guided bands. We use pre-training to improve neural network performance, and numerical differentiation to accurately predict group index curves. Our approach allows for rapid, application-specific tailoring of PhCWs with a runtime of sub-milliseconds per design, a significant improvement over conventional simulation techniques.


Introduction
Slow light in photonic crystal waveguides (PhCW) has the potential to enhance optical delay and nonlinear optical interactions, and to reduce the size and power consumption of components such as optical switches and modulators [1,2].In recent years, PhCWs have been used for applications including beamsteering for LiDAR [3,4], autocorrelators [5], enhancing the sensitivity of optical interferometers [6] and studying photonic transitions [7], enabled by the development of dispersion and loss engineered PhCWs.Here, structural parameters such as the radii or position of individual rows of holes are modified to control the waveguide performance, e.g. the group index and available slow light bandwidth [2,[8][9][10] or the propagation loss [11,12].
Further uptake of slow light PhCWs is limited by the complexity of the device design.Compared to other integrated photonic devices, such as ring resonators or y-splitters, PhCWs can have more structural parameters (see Fig. 1) that together determine the device performance.Furthermore, there is no known analytical description linking the device performance to the device structure.
The standard approach to designing PhCWs then is to predict the device performance using numerical simulations.Here, the use of approximate methods, such as the Plane Wave expansion [13] or guided mode expansion methods [14], significantly reduces computation requirements compared to fully vectorial solvers such as finite difference time domain or finite elements methods without sacrificing accuracy.
Computational requirements can be reduced further through 2D approximations of the 3D structure, such as the effective index [15] or the more accurate effective period methods [16], at the cost of yielding approximate results which must be validated by 3D simulations before device fabrication.These advanced computation methods combine accuracy with reduced computation cost, e.g.dropping the time taken for a single simulation to the order of a minute.
Typical PhCW engineering methods shift the position (perpendicular or parallel to the waveguide) or radius of individual rows of holes and have often been restricted to only changing one of these parameters at a time (for typically two or three rows of holes).Wider design Common modifications include (middle) perpendicular displacements s i of the position of rows [9] and (bottom) variations in the radii of holes r i [8].PhCWs have reflective symmetry across the y-axis hence modes have odd or even y-parity.
spaces have been opened up by replacing the prohibitively slow manual analysis and assessment of device performance with automated optimization methods, e.g.Refs.[17][18][19].Even using these optimization algorithms, only minute subsets of the full parameter space can be explored.Classical optimization algorithms, restricted by the time taken to simulate each design parameter combination, run through 10,000 to 100,000 possible device realizations (using 2D approximations).If we assume a 7-dimensional parameter space (as we will outline later), with at least 100 possible values per parameter we achieve 100 7 = 1 × 10 14 combinations.To access this full design space a new simulation paradigm is needed, one that achieves the accuracy of 3D simulations, while simultaneously reducing the computation time by several orders of magnitude.
Deep Learning for the design of photonic devices in general rapidly gained popularity over the past few years [20][21][22].Example applications include designing integrated power splitters [23] and computing the band structures of 2D photonic crystals [24][25][26].In fact, neural networks have recently been applied to the forward problem of predicting the normalised delay bandwidth products of PhCWs [27] from various design parameters.Here, we focus on the more general task of predicting 6 full gap guided bands from a 7-dimensional design parameter space and achieve excellent accuracy -a mean test error of 0.103%.At the cost of a minimal reduction in accuracy compared to the plane-wave-expansion-based MIT Photonics Band solver (MPB), the NNs are 5 orders of magnitude faster than 2D simulations.
Furthermore, this task is ideally suited to approaches similar to transfer learning [28] which is a machine learning technique where knowledge of one problem is used to simplify another related problem.Transfer learning has previously been applied to photonic problems [28,29], e.g. to train a NN to predict the transmission spectra of 8-layer photonic film using a NN pre-trained on 10-layer films.Here we use a similar approach, first pre-training a NN on a large 2D MPB dataset before continuing training using a small 3D dataset, to reach a test set relative error of 0.296%.Compared to directly training a NN on the 3D dataset (relative error 1.22%), pre-training led to a 75% reduction in error.Thus we can achieve accurate prediction of the 3D-bandstructure (which represents the true device bandstructure, and not an approximation as produced by 2D simulations), without the need for a large 3D-dataset which would be computationally expensive to produce.
At inference time, the NN is 7 orders of magnitude faster with a runtime of 8.4 × 10 −5 seconds per design on a desktop computer compared to several hours for 3D MPB simulations.

Design space and band structures
Photonic crystal waveguides are created by introducing a line defect in a photonic crystal slab consisting of air holes in a silicon membrane (Fig. 1(top)).We parameterize the PhCW designs under consideration as a 7D vector where r i and s i refer to the radii (Fig. 1(bottom)) and perpendicular shifts (Fig. 1(middle)) of holes of the inner three rows, respectively, and r 0 denotes the radii of all other holes.This vector represents a commonly used design space, with several groups highlighting the importance of the first three rows for slow light PhCW design [10,11,30,31].To ensure sufficient coverage of the design space given limited computational resources, we restrict the design space to 7 dimensions by freezing out parallel shifts as these often lead to designs with very high propagation losses [30].We impose the following design constraints to account for fabrication limitations such as minimum feature sizes or the need for a connected membrane [9,11,32]: where a is the lattice constant, here fixed at 400 nm.The design space d was uniformly sampled subject to the above constraints to avoid biasing the NN.(The distribution of design parameters is shown in Supplement 1.)For PhCW applications, we are typically interested in the guided modes beneath the light line (Fig. 2), as these are intrinsically lossless.We also need to know if these bands are single mode at our desired operation frequency (which is application dependent) and especially if there are leaky modes -those above the light line [33].Hence, we chose as targets for our neural network several gap guided bands ω n,k x over the entire Brillouin zone, 0 ≤ k x a/2 ≤ 0.5.

Dataset generation
We simulated each design using the MIT Photonic Bands (MPB) eigensolver [13] to compute the lowest 30 frequencies ω at 101 evenly space values of k x .(A copy of the MPB control file, together with other underlying data is available [34].)The 2D effective period approximation [16] was used to reduce the simulation time by approximately two orders of magnitude.Since the PhCWs have reflective symmetry, bands can be classed by even and odd y-parity where the y-direction is in the plane of the PhCW but perpendicular to the light propagation.When bands of the same parity approach each other they anti-cross, a key factor in controlling the slow light behaviour of the bands [10], while bands of opposite parity cross.The parity of each ω, k x pair was obtained using an inbuilt MPB function so that each band structure could be split into even and odd bands.Bands of the same parity were sorted by frequency so that they can be uniquely labelled by ω n,k .We selected even bands ∈ [10,11,12] and odd bands ∈ [11,12,13] as these typically lie in the region of interest -in the band gap and below the light line ω = c k x (see Fig. 2).In total, we simulate 100,000 examples which, using 2D approximations, took 5000 cpu core hours (∼ 3 cpu core minutes/design) on a high-performance computing cluster.A second dataset was generated using 1000 3D simulations over 6500 cpu core hours (∼ 6.5 cpu core hours/design).The recommended min-max normalisation scheme [35] was used to rescale all features to the interval [0, 1] using the known parameter ranges over which the designs were generated.
We split both the 2D and 3D datasets into a standard training, validation and test set in the ratio 70:15:15.

Neural networks
The NN's task is to learn the forward problem: predicting guided bands from the design parameters of a PhCW.A fully connected network is appropriate for the input vector since there is no spatial ordering of the design parameters which fully describe a PhC waveguide.The architecture of the best-performing model is a stack of 5 dense layers of 500 nodes followed by two fully connected branches of a further 5 layers with 500 nodes each: one branch for the even parity bands and one for the odd parity bands, as outlined in Fig. 3.We use the LeakyReLU activation function (with default α = 0.3) because it is fast to evaluate and outperforms the common Rectified Linear Unit activation (ReLU) function [36].Each branch is finally connected to 303 output nodes which represent the 3 flattened bands of 101 k x points for each parity (see Fig. 3).The total number of trainable parameters is 3,814,606.We implemented the NN using the popular Keras machine learning library [37] with a TensorFlow backend [38].
We chose the mean squared error (MSE) for the loss function and the Nadam optimizer [39] with default hyperparameters except for the initial learning rate which was tuned to 1 × 10 −4 .To minimize overfitting and avoid tuning the number of epochs, early stopping [40] terminates training when the validation loss does not improve over 300 epochs and restores the NN's weights to the lowest validation loss state reached during training.Additionally, we reduce the learning rate on plateau [35] meaning the learning rate is halved when the validation loss does not improve for 200 epochs.We find that L2 regularisation [41] of 1 × 10 −9 is necessary to mitigate overfitting.
Training the final NN for 4000 epochs took 27 hours on a tesla v100 graphics card and the NN converged to a minimum validation loss of 9.43 × 10 −7 and training loss 8.31 × 10 −7 .The NN is available at [34].

Direct learning and pretraining
We trained a NN on the 3D dataset with randomly initialised weights ('direct learning').Other than the L2 regularisation value which was tuned to 1 × 10 −8 , the same hyperparameters were used as for the final 2D NN descibed in the previous section.
We used the NN from section 2.3 that was trained on the large and relatively inexpensive 2D dataset, as the pretrained NN for the small 3D datset.Concretely, we gradually unfroze layers for training with a low learning rate and L2 regularisation (following method from Ref. [35]).First, the weights of the two highest layers of each branch were unfrozen and trained with a learning rate of 1 × 10 −6 and L2 of 1 × 10 −7 until convergence at 5000 epochs.This was repeated for the four highest layers and then all layers but the first 4 dense layers with L2 of 1 × 10 −8 .Finally, we fine-tuned the entire model with a low learning rate of 1 × 10 −8 .Higher learning rates quickly led to overfitting especially when more layers were trainable, and we arrived at the method outlined here by trial and error.

Prediction of band structure
Our aim is to use the NN to predict the band structure of our PhC waveguides.Thus for an assessment of the accuracy of these predictions, we calculate the relative errors in the predicted frequency ωs,b,k for each band b at each k-value for every sample s in the test set.As shown in Fig. 4 most errors are small with 95% of errors falling in the range −0.322% to +0.326%.To evaluate the accuracy of a single band structure prediction we average over all k x points (N k ) and all odd and even bands (N b ) for a mean absolute percentage error (MAPE s ) defined as: the distribution of which is plotted in Fig. 5.While the MSE was used for the loss function, the MAPE s is a suitable metric for the evaluation of a band structure prediction as the deviation of all 606 data points is weighted equally.For a qualitative comparison, four predicted and actual band structures are plotted in Fig. 6 corresponding to the median, 95 th and 99.7 th percentiles as well as the prediction with the highest MAPE s in the test set.
The MAPE over all 15,000 samples in the 2D test set is 0.103% indicating the NN is highly accurate and can generalize very well to unseen samples.
Once trained, it takes the NN 8.4 • 10 −4 s per design per cpu-core to predict band structures.This is a factor of 214,000 times faster than 2D simulations on the same hardware.

Group index calculation
Since PhCWs are typically used for their slow light enhancement of light-matter interactions, the key figures of merit not only include the bandstructure (i.e.operating frequency), but also the group index and group index distribution, typically quantified through the group index bandwidth product (GBP) [10].Therefore once both the group index and frequency are known as a function of k x , we have the key elements required from PhC waveguide simulations.The group index, n g is related to the gradient of the bandstructure via: and we can obtain the group index distribution through numerical differentiation of each band.An example group index plot is shown in Fig. 7 for the band structure with the greatest MAPE s of the entire test set.Even in this largest error case, the two sets of group index curves are visually in good agreement, indicating that we can use the output from our neural network to predict the group index of PhCWs.
Due to the divergence of the group index where the bands are flat, the MAPE is not a meaningful metric, as small changes in the exact divergence point can quickly lead to extremely large errors, localized in a very small number of k-points.A similar effect occurs at stationary points (e.g.inflection points) within the band.Additional exemplary group index plots are shown in the Supplement 1.

3D dataset: direct learning and pretraining
We trained a NN on the 3D dataset with randomly initialised weights ('direct learning') which achieved a MAPE of ∼ 1.22% on the 3D test set containing 150 elements.
The pretrained NN reached a 3D test set MAPE of ∼ 0.296%.Pretraining therefore reduced the test error by 75% compared to direct learning, indicating highly accurate predictions of PhC waveguide bands using a NN, with an accuracy matching 3D simulations.To assess the impact of training set size on the error, we trained 5 NN on subsets of the 2D training and validation sets.The dataset sizes and corresponding relative errors are plotted in Fig. 8.To achieve the same accuracy with direct learning a prohibitively large 3D dataset would have been required.

Discussion
Our initial NN was trained on a dataset generated using 2D simulations and the MAPE used to evaluate the predictive accuracy was calculated over a test dataset of 15,000 elements.The mean relative error (0.103%), together with a small spread of the MAPE s shown in Fig. 5 (max (MAPE s ) = 0.949%) indicates that the NN generalises well to unseen designs.A percentage error of 0.1% corresponds to a 1.55 nm error in the operating wavelength for a waveguide operated near λ = 1550 nm.A similar shift is associated with a 0.5 nm variation in the hole diameter and slab thickness [42], thus the predictions from our NN fall well within the variation of state-of-the-art fabrication processes.
Although our 2D NN is capable of accurately predicting the band structure of PhC waveguides, 2D simulations produce biased band structures and the associated group index curves are typically 10 to 20% out [16] compared to those obtained using 3D simulations, which generally match experimental results [42].The creation of a 100,000 element 3D training set is prohibitively computationally expensive.As a pilot, we generated a 1000 element 3D dataset to directly train a NN.The resulting 3D test set predictions had an unsatisfactory MAPE of 1.22%.Motivated by recent applications of transfer learning in photonics [28], we used the pre-trained 2D NN as a starting point for training on the 3D dataset.The resulting error on the 3D dataset was 0.296% which was 75% lower than by direct learning highlighting the utility of this method for problems where only small datasets are computationally feasible.
Our NN models the forward problem of predicting 6 bands near the area of the expected band gap given a 7D vector of structural PhCW parameters.NNs to predict a single Figure of Merit (FoM), the Normalised Delay Bandwidth Product, were used in an alternative machine learning approach to evaluate PhCW performance [27].In comparison, the key drawback of our approach is the increased complexity of predicting 606 outputs, which necessitates a larger training set and a NN with more parameters than for the direct prediction of a single FoM.We believe this is outweighed by the additional knowledge that can be gained from observing the full bands, which can answer important questions.For example: is the waveguide single mode or multi-mode at a given frequency?Do these additional modes lie below or above the light line?These factors strongly impact propagation losses [33] and should be considered when deciding which designs can be used for a given application.
The NN can also be used to gain insight into the behaviour of PhCWs, e.g. by plotting the evolution of bands as design parameters are varied.
For the group index prediction, we note that the NN predicted curves are 'noisy' compared to those calculated from MPB simulations (Fig. 7).The accuracy of band predictions (and further numerical processing) could be improved by smoothing bands for example by applying a moving average over the bands and/or group index curves, similar to methods used to remove noise from experimental group index measurements [43].Alternatively, the smoothing function itself can be learned by appending trainable convolutional layers to the NN [44].
We note there are design parameter combinations for which all or most bands are in fact bulk modes, however, this is not a feature of the NN.Instead, it is a consequence of our training data generation pipeline, where we selected specific bands by band index, rather than identifying a bandgap and then selecting the bands that are within the bandgap.

Conclusion
We present a neural network that offers a fast and highly accurate alternative to conventional MPB simulations of photonic crystal waveguides.Our NN achieves an improvement in simulation time of 5 orders of magnitude compared to 2D simulations, and of 7 orders compared to 3D simulations, with a sub-millisecond CPU time per design.
We compare the NN band structure predictions with a test set of 2D-MPB simulations and show that our NN has a mean absolute percentage error ∼ 0.1% and also predicts the group index behaviour of the designs under consideration.We use this NN as a pretrained NN for a small dataset of full 3D simulations.Comparison with a 3D test set yields a MAPE ∼ 0.3%, however, this time with respect to 3D simulations that closely match experimental results [42].This indicates that after pretraining our NN is in fact more accurate than 2D simulations.
This represents a new paradigm in PhC waveguide design.The reduction in computation time means that optimization algorithms or other machine learning-based selection mechanisms can now be used to search a vastly larger range of the full PhC waveguide parameter space, without increase in the computation cost.Similarly, the existence of a forward NN allows the use of machine-learning based approaches to inverse design [45]; our NN could act as a pre-trained part of an inverse NN architecture [46].This opens the door for application-specific PhCW designs and hence the realization of the full potential of slow light in PhCWs.

Fig. 1 .
Fig. 1. (top) A W1 row defect PhCW with lattice spacing a and base hole radius r 0 .Common modifications include (middle) perpendicular displacements s i of the position of rows[9] and (bottom) variations in the radii of holes r i[8].PhCWs have reflective symmetry across the y-axis hence modes have odd or even y-parity.

Fig. 2 .
Fig. 2. Band diagram of a PhCW showing the bands under consideration, which are centred on the band-gap, labelled by symmetry.

Fig. 3 .
Fig. 3.The input layer takes a 7D design vector d and the NN outputs two 303 dimensional vectors -one for each parity branch -composed of 3 bands of 101 k x points each.Arrows show dense connections all of which (except the last) are followed by LeakyReLU activations.

Fig. 6 .
Fig. 6.Predicted (dotted lines) and actual (solid lines) band structures from the selected boundaries shown in Fig. 5.

Fig. 7 .
Fig. 7.The group index as a function of k x (Eq.(3)) for the worst predicted band structure in the test set (Fig. 6(d)).The actual (solid lines) and predicted (dotted lines) bands are plotted qualitatively showing good agreement in the slow light region.

Fig. 8 .
Fig. 8. Learning curve showing the 2D test set relative error of NNs trained on subsets of the 2D training set of varying size.