Neural Networks enabled Forward and Inverse Design of Reconfigurable Metasurfaces

Nanophotonics has joined the application areas of deep neural networks (DNNs) in recent years. Various network architectures and learning approaches have been employed to design and simulate nanophotonic structures and devices. Design and simulation of reconfigurable metasurfaces is another promising application area for neural network enabled nanophotonic design. The tunable optical response of these metasurfaces rely on the phase transitions of phase-change materials, which correspond to significant changes in their dielectric permittivity. Consequently, simulation and design of these metasurfaces requires the ability to model a diverse span of optical properties. In this work, to realize forward and inverse design of reconfigurable metasurfaces, we construct forward and inverse networks to model a wide range of optical characteristics covering from lossless dielectric to lossy plasmonic materials. As proof-of-concept demonstrations, we design a Ge2Sb2Te5 (GST) tunable resonator and a VO2 tunable absorber using our forward and inverse networks, respectively.


Methods
Full electromagnetic simulations based on finite difference time domain (FDTD) method are performed using commercial simulation package Lumerical FDTD Solutions to generate the dataset used in this work. Square unit cell geometry (same periodicity along both of the horizontal direction) is adopted wall-to-wall distance between adjacent nanodisks to be 500 nm. Periodic boundary conditions are applied at all horizontal boundaries (x and y), and perfectly matching layers (PMLs) are employed at vertical boundaries (z). Substrate thickness is assumed to be infinite by placing bottom PML boundary inside the substrate, and the built-in sapphire (Al2O3, ≈ 1.72) model is used for the substrate. TM polarized plane wave is used as light source, which is also located inside the substrate illuminating towards +z direction. The optical response (complex reflection and transmission coefficients) is obtained from near field monitors located above (transmission) and below (reflection) the unit cell structure. Operating spectral range is defined from 2 μm to 4 μm wavelengths, and the optical response is sampled at 501 data points with equal wavelength spacing. The radius is changed from 250 nm to 1500 nm with a constant step size of 50 nm. The height is tuned from 40 nm to 200 nm with 20 nm step size. 30 complex refractive index values are chosen randomly 2+0j to 10+10j, where 5 of them are forced to have 0 (zero) k to shift the dataset a bit toward dielectric side (See Supplement 1 for employed complex refractive indices). The resulted values are set as the material indices at the midpoint of training spectral range (λ=3 μm), and fitted by, and obtained from, the simulation program itself for the rest of the spectrum. These, the fitted n(λ)+jk(λ) values are used in the feature matrix. As a result, 30x26x9=7020 samples are created as dataset. The simulations took approximately 2 days (48 hours). 1/10 of these samples are randomly separated as validation set, and the rest is used in training. The total size of the entire dataset is 366.7 MB.
For all complex-valued networks network Keras-Complex library "complexnn" [1] and Keras framework are used. We train all the models using Adam optimizer [2], mean squared error (MSE) as the loss function and 'tanh' as activation function. Figure S1 represents the complex refractive indices at the mid-point of training spectral range (λ=3 μm). 30 n-k pairs are randomly selected (from 1 to 10 for n and 0 to10 for k), where 5 of them were forced to have k=0 in order to increase networks accuracy on predicting lossless dielectric materials. The complex refractive indices are set for λ=3 μm, and fitted over entire training spectral range (from 2 μm to 4 μm) by the commercial simulation program (Lumerical FDTD Solutions). The fitted valuas are used in the training and validation processes.   Figure S2 represents a comparison of complex-valued and real-valued neural networks (CVNNs and RVNNs). The networks have the same structure of 5 hidden layers with 20,80,120,80, and 20 neurons, as indicated in the Figure 1a of the main script.

Comparison of Complex-Valued and Real-Valued Networks
Although the implementation of CVNN seems to be similar to a real-valued NN with doubled number of channels at each step, the calculations performed within the complex-valued layers are not linearly independent [3]. Comparing complex multiplication of our complex-valued feature x+jy with complex-valued weight u+jv, with scalar vector multiplication of the corresponding vectors [x y] and [u v], respectively, provides a good example to understand the main difference. The former calculation results in a complex-valued output having 2 degrees of freedom as real and imaginary part. When represented as phasors, this multiplication corresponds to scaling and rotating the complexvalued features. The latter, on the other hand, results in a 2x2 matrix having 4 degrees of freedom corresponding to reflecting, shearing, scaling and rotating the initial complex-valued feature. Real-valued networks lose the mathematical correlation between real and imaginary components due to these additional degrees of freedom.  As seen in the Figure S2a,b, CVNNs have higher learning rate (|Δ(MSE)| per epoch). Although the real-valued network closes the gap as the models converge, complex-valued network maintains higher accuracy.
Another aspect that is improved by the use of complex-valued networks is consistency. Probability density function (Pdf) and standard deviation (σ²) of MSE values are two important indicators o the consistency of models' accuracy over the dataset. As seen in Figure S2c and Table SI, CVNN has narrower Pdf and smaller σ² than RVNN. Another important measure is cumulative density function (Cdf). As seen in Figure S2c,d, the 95% of sample sets (combinations of R,H, and n+jk) have MSE ≤ 2.0x10 -4 and MSE ≤ 3.1x10 -4 , and maximum error are noted as 1.2x10 -3 and 2.4x10 -3 for complex-valued and real-valued networks, respectively. These results, overall, indicate higher consistency of CVNN's predictions than that of RVNN. An alternative way to observe the improvement provided by the use of CVNNs is to examine the networks' performance with respect to amount of training data as shown in Figure S3, where we compare the overall training errors of CVNN and RVNN. As seen in the figure, the CVNN noticeably outperforms the RVNN in the small-data regime, and the difference evanesce as the networks are saturated with training data.
All in all, the results provided above indicates the advantage of preserving the mathematical correlation of complex values. However, as the problem here is low dimensional and only 1 of the 4 input parameters is complex-valued, the improvements provided by complex-valued networks are small. Larger improvements are expected for more complex problems. Figure S4 represents the reflection response (complex reflection coefficient, reflectance and phase shift) of examples from the validation set, of which the transmission response is shown in the main script. Fig. S4. a, c Comparison of predicted (real part: blue and imaginary part: red) and simulated reflection coefficients of exemplary samples from the validation set. a Plasmonic material (n~k), H=140 nm, and R=1500 nm. c Dielectric material (n constant, k=0), H=160 nm, and R=600 nm. b, d Corresponding transmission and phase values of unit cells that are indicated at a and c, respectively.

Spectral Generalizability
In this work, we applied our previously proposed wavelength normalization method [4] with CVNNs. In the previous study, we have shown that wavelength scaling of geometric parameters provide spectral generalizability. However, this normalization causes one-to-many mapping problem by creating identical input sets corresponding different operation wavelengths. The problem can be solved by including wavelength as an explicit feature, which in turn, destroys spectral generalizability.
Here, to provide spectral generalizability without sacrificing performance of our models, we replace operation wavelength (λ) a pseudo-feature, which we define as a feature that defines but doesn't discriminates the sample (the unit structure, in this case). Based on this definition, we define wall-to-wall distance (d) as our pseudo-feature. It is a feature since it defines the unit structure, and it does not discriminate a sample from others as it is constant over the data set. It is also normalized to operation wavelength ( ) similar to the other dimensional parameters. Using instead of λ, we re-train the CVNN that models complex transmission coefficients of our unit structures. As indicated in Table SII, the overall performance is preserved in the training spectral range.  Figure S5 exhibits prediction capability of our model (with ) outside the training spectral range for dielectric (S5a) and plasmonic (S5b) materials. As seen in the figure, the network accurately predicts both amplitude and phase of transmitted light outside the training range. Around resonances, it maintains high accuracy yet exhibits slight deviations. Note that the phase shift predictions are projected from training range to operation range with respect to difference between reference values (phase shift introduced by bare substrate) at each spectral range.