Neural-adjoint method for the inverse design of all-dielectric metasurfaces

All-dielectric metasurfaces exhibit exotic electromagnetic responses, similar to those obtained with metal-based metamaterials. Research in all-dielectric metasurfaces currently uses relatively simple unit-cell designs, but increased geometrical complexity may yield even greater scattering states. Although machine learning has recently been applied to the design of metasurfaces with impressive results, the much more challenging task of finding a geometry that yields the desired spectra remains largely unsolved. We explore and adapt a recent deep learning approach -- termed neural-adjoint -- and find it is capable of accurately and efficiently estimating complex geometry needed to yield a targeted frequency-dependent scattering. We also show how the neural-adjoint method can intelligently grow the design search space to include designs that increasingly and accurately approximate the desired scattering response. The neural-adjoint method is not restricted to the case demonstrated and may be applied to plasmonics, photonic bandgap, and other structured material systems.


Introduction
Electromagnetic metamaterials have been used to realize many exotic scattering responses over the last two decades. Effects including negative refractive index and cloaking have generated significant interest and have served to drive the community [1][2][3][4][5]. A more applied, but still relevant metamaterials achievement, is that of graded designs [6,7]. It was shown early on that unit-cells which form metamaterials may be designed with a spatial dependence across a surface and / or volume, and various lensing effects were shown that utilize spatial degrees of freedom. A principle of gradient metasurfaces is that their scattering properties change slowly as a function of spatial coordinate. Broadband metamaterial absorbers [8,9], and metamaterial spatial light modulators [10], also make use of dissimilar neighboring unit-cells, however there is no such requirement to make neighbors as alike as possible, and designs are simply cobbled together to achieve the desired response. This fact highlights a design feature of sub-wavelength metal-based metamaterials, i.e. their scattering response is primarily connected to their geometry and -due to minimal neighbor interaction -the unit-cell largely governs the electromagnetic properties of the array. Conventional optimization techniques built-in to modern day electromagnetic mode solvers are sufficient to achieve a desired response, and designs of differing units may be assembled with little change to the overall response.
In principle, all-dielectric metasurface (ADM) unit-cells may also be used to tesselate a surface in an arbitrary fashion similar to that achieved with metal-based metamaterials. However, ADMs resonators are only marginally sub-wavelength, and modes utilized are often not confined within their physical bound -with an evanescent tail lying just outside their surface -resulting in significant neighbor interaction. Further, surface modes related to the periodicity are not restricted to nearest neighbors, and may persist over several spatial periods. Despite the success of ADMs [11,12], it is conceivable that still-richer electromagnetic scattering can be achieved if more complex geometries are employed. However, physical understanding for such metasurfaces is poor -simple functional relationships, or even heuristic guidance regarding super unit-cell geometry and final electromagnetic properties is unavailable. The only contemporary means to estimate such metasurface properties, given a candidate geometry, is simulation or fabrication. However, given the vast space of potential designs and the speed of conventional simulation and fabrication, it is completely infeasible to iterate over designs in order to achieve a desired response.

Neural-Adjoint Method
While deep learning enables substantially faster evaluation of candidates, given the vast number of possibilities for many problems (e.g., approx. 10 12 in our case), it is nonetheless difficult or impossible to iterate over all or most candidate designs. From a design perspective, what would be of greatest utility would be to instead specify a desired frequency dependent electromagnetic response , and have a model or solver compute a specific approximate geometryˆ, which yields (hereˆdenotes approximate). The solution used to search for such an ideal geometry may be cast as an inverse problem written asˆ=ˆ− 1 ( ). Hadamard described a solution method as well-posed if it met three criteria -1 : existence (at least one solution), 2 : uniqueness (only one solution), and 3 : stability (solution must depend continuously on ). A solution approach is then ill-posed if one of the 1 − 3 conditions fails [26]. Past work has shown that finding a specific ADM consisting of distinct neighboring resonators in a super unit-cell geometry for a desired scattering state is an ill-posed non-linear inverse problem and, in particular, conditions 1 and 2 are not met [13].
Here we explore a recently-proposed deep inverse modeling approach, called the neural-adjoint (NA) method [27], which outperformed many other methods for solving ill-posed inverse problems. Here we adapt the NA method to solve a challenging problem involving the design of ADMs geometries with 14 free geometrical parameters; a much larger space than most recent work (e.g., 5 − 10 free parameters in [14,16,18]). Furthermore, existing inverse methods often involve complex training procedures, and ultimately produce sub-optimal solutions. In contrast, the NA only requires that we train a single conventional feed-forward neural network, and -as we show -appears to find close approximations to the globally optimal solution within just one minute, even for our complex ADM design problem. Furthermore, we demonstrate how the NA method can be utilized to expand the design search space, essentially providing a form of active learning that is specifically tailored to solve inverse problems. We explore an example inverse problem, where a frequency dependent infrared absorptivity ( ( ) = ) is desired, which we would like to achieve using an ADM consisting of a square array of 2×2 resonators, each with an elliptical generatrix lying perpendicular to its directrix (line of length ℎ) -depicted in Fig. 1. Although past experience and single unit-cell simulations guide us to choose approximate metasurface dimensions which yield resonances in the desired spectral range, our inverse problem is ill-posed, since we do not know if a solution exists, i.e. condition 1 is not met. Further, we do not meet the condition of uniqueness, 2 , since many metasurface solutions (disparate geometries) may yield the same spectra -at least within the accuracy of our chosen loss metric, and numerical precision. The proposed ADMs consists of a geometry space of fourteen parameters: [ℎ, ,  the periodicity of the super cell, 1 -4 and 1 -4 are the x-axis and y-axis radii of four elliptical resonators respectively, and 1 -4 are the rotational angles with respect to the center axis of each elliptical resonator. All geometry values are randomly sampled from the data grid shown in Table S1, which is included in the Supplemental. The numerical simulations use SiC for the proposed ADMs, and the Supplemental contains more details.

Results
The DNN forward model is trained by pair of 14 geometrical parameters and 2000 frequency points from 100-500 THz. Results of the forward model trained on 60k { , } pairs (approximately 5.75 × 10 −8 of the entire geometry space) are shown in Fig. 2 (a)-(c), where red curves are numerical simulations, and blue curves are the DNN predictions. The absolute error between simulation and prediction is plotted as the gray curve on the right vertical axis of Fig. 2 (a)-(c). In Fig. 2 (d) we show the histogram of the mean-squared-error (MSE) for the entire validation set, and find that 95% have an MSE≤ 2.49 × 10 −3 (dashed gray line), and an average MSE of 1.2 × 10 −3 . Having an accurate trained forward model we next turn toward the inverse solution. The NA method [27] finds the optimal inverse solution by fixing all the weights and biases ofˆ, and computing the forward model's gradient solely with respect to the input to the network (i.e., the geometry), starting from randomly chosen values, denotedˆ0. It is important to highlight that is a closed-form differentiable expression, and thus calculation ofˆ/ is trivial. Further, we can estimate the gradient of the input geometry with respect to a loss function L that we are free to specify (e.g., mean squared error). Therefore, if is our desired spectrum, andˆis our current best estimate of the metasurface geometry, we can iteratively move along the loss surface to find a better solution using,

1)
where is the learning rate. Theˆcan then be evaluated iteratively until some convergence criteria is satisfied (e.g., L changes very little after each iteration). Because this is a gradientbased procedure, it will only converge to a locally optimal solution. As a result, the NA method prescribes that this search process be repeated times, each time starting from a different randomly chosen value ofˆ0. In practice we find that we may run greater than 10 4 initializations in parallel with no cost in speed -only limited by available memory. Therefore the NA method produces candidate designs, and we can choose the best design (or several designs, if desired) by passing each design back intoˆand evaluating their similarity to the target scattering properties. Notably, while the user can specify many different choices for L, the NA method prescribes that a so-called "boundary loss", L should be added to any user-chosen loss [27] and is given by , where rectified linear unit (ReLU) is the activation function, is the mean of the geometry training data, and is its range. This boundary loss punishes the inference process with increasing loss if the geometry search process steps out of the space of the training data, where the forward model may produce inaccurate estimates of scattering parameters. In our experiments we use the following total loss: L = ( −ˆ(ˆ)) 2 + L . As an initial test of the NA inverse method, we feed in frequency dependent absorptivities ( ) where we know apriori that a solution exists, i.e. is a numerical simulation from whichˆoriginates from. In Fig. 3 we show results of the NA inverse method and each sub-plot shows characteristic results of (a) the best results, (b) average results, and (c) and (d) below average results, all Absorptivity Absorptivity based on MSE. In all of the these examples, the NA method identifies a close approximation to the correct solution. This is impressive given the complexity of the spectra present in Fig.  3. We suspect the small remaining errors in the predicted design are due largely to the limited precision of gradient descent as it nears solutions (i.e., minima) in the error space; due to the non-zero learning rate in Eq. 1, it cannot converge to the exact minimum point. We note however that learning rate can be gradually reduced during the search process, at the cost of additional computation time, until a solution of desired precision is obtained. We next turn to the significantly more challenging task of applying the NA method to a spectra where we are unaware if a solution exists within the chosen geometrical parameter space, i.e. criterion 1 for inverse problems is violated. We chose the frequency dependent external quantum efficiency (EQE) of gallium antimonide (GaSb) as , shown as the gray curves of Fig.  4. The metasurface will operate at elevated temperature, and thus we consider the so-called graybody spectra -also termed the spectral exitance , ( ) -which is given by the blackbody radiation curve times the absorptivity. We keep the top 16,000 neural adjoint solutions (spectra) and determine , ( ) for each of these at 100 temperatures between 1500 and 2500 k -a total of 1.6 × 10 6 candidates. The shape of the EQE curve differs significantly from typical spectra we see in our geometry space (Fig. 3). None-the-less we find a best solution resulting from the NA method at a temperature of = 2100 that consists of a geometry of [ℎ = 0.566, = 1.440,   on the region of interest for energy harvesting purposes. To verify our neural adjoint results, we numerically simulate the predicted geometry and plot the resulting , ( ) in Fig. 5 (a) as the red curve -again compared to the EQE of GaSb (gray curve). As can be seen, the simulated curve has many relatively sharp peaks that are not present in Fig. 4(b). This is because, as noted earlier,ˆdoes not perfectly match the numerical simulator, and this will introduce errors in the design process. Thus since the NA method relies onˆto search for designs, it is also limited by the accuracy of the forward model estimate. We also found that sinceˆis trained from geometries constrained to a grid, the discrepancy between NA solutions and numerical simulation arises because NA solutions are not confined to the grid, where our model is most accurate. None-the-less we find that our simulated , ( ) achieves an MSE of 1.65×10 −2 , as shown in Fig. 5 (a).
Another major obstacle is that our design search space does not contain a geometry that can realize our target spectrum (i.e., Hadamard's criteria 1 ). This is suggested by the fact that our best NA solution, shown in Fig. 4(a), still does not match our target spectrum. However, we can use the NA output to identify where we should expand our search space so that it will include better designs. We can do this by visualizing the error of all inverse solutions returned by NA, and looking for trends e.g., we may find that all the best solutions are bunched up against some edge of our initial search space, suggesting that expanding along that dimension may yield better results. However, since we have a 14 dimensional design space, we are unable to easily visualize these data. To address this problem, we use the Uniform Manifold Approximation and Projection (UMAP) [28], which is a type of dimensionality reduction method permitting us to visualize the distribution of our inverse solutions performance in 2D, so that we may more easily identify  Fig. S1 -shown in Supplemental -we find that our best NA inverse solutions are limited by height. Shown in Fig. 4 (d) are NA solutions as a function of height and periodicity color mapped by corresponding MSE values. It is evident that not only are our best solutions grouped at the maximum height allowed in our geometrical space, ℎ = 0.6 m, but also that the solutions improve as a function of height. Encouraged by these results, we expanded our original geometry space to include increased height values from 0.6 to 0.75 m, by simulating an additional 24k { , } pairs. The NA model now trained on the expanded geometrical space indeed finds an improved solution, shown as the blue curve in Fig. 4 (a), where we realize an MSE that is reduced by a factor of 2.7. The simulated red curve in Fig.  5 (b) further validates the result that the MSE of numerical simulations is also reduced -here by a factor of 4.8. A plot of the 1000 best NA solutions in the expanded geometrical space shown in Fig. 4 (c), however, indicate that we may be able to make continued improvements, since we still have a gradient pushing for greater heights -although the periodicity seems to be honing in on a value of 1.2 m.

Conclusion
We have adapted the neural-adjoint inverse design method [27] to accurately predict the highdimensional all-dielectric metasurface geometry needed to produce a targeted infrared absorptivity spectra. When the geometry needed to produce a desired spectrum lies outside of the bounds, the NA method appears to find the best possible solution within the permissible search space. Unlike other adjoint inverse approaches [29], the NA method does not require any domain knowledge specific to the problem. In the event that the inverse solution does not exist in the parameter space explored, NA may be used to guide one to a better solution, through expansion of the design parameter search space. This may help to reduce the initial required number of numerical simulations, and to instead use NA guided simulation exploration. With its exceptional computational speed, high accuracy, and potential use in active learning that is explored here, the neural-adjoint method has an impressive future in not only ADMs thermal emitter but also any ADMs inverse problems. The NA method is not restricted to the case presented, but may be applied to many other systems including photonic band gap and plasmonics.

Neural-adjoint method for the inverse design of all-dielectric metasurfaces: supplemental document 1. NUMERICAL SIMULATION
The cylindrical resonators' geometry was previously demonstrated in the THz regime by[? ? ]. To prove deep neural work capability with high-dimensional inputs, we increased the geometrical dimensionality by introducing the elliptical structures to previous cylindrical resonators, and each elliptical resonator undergoes a rotation angle ranging from -45 to 45 degrees. Furthermore, governing the fabrication practicality, we fixed all elliptical resonators to have the same height. To migrate from the THz regime to the infrared, materials and geometry sizes are scaled accordingly. We sized down our unit cell volume refers to the ratio of THz frequency and infrared frequency used in the legacy design and current design. Then an optimization on an adequate scale to finalize our geometry boundary listed in Table 1. We chose SiC for our simulations, considering its high melting point, high oxidation resistance, and reasonable absorption coefficient in nearinfrared. We used experimentally measured relative permittivity data of SiC from 0 to 300 THz to fit the dispersive materials property of SiC in our simulations. To implement the unit cell boundary conditions in CST Microwave Studio we used for our simulation, we used finite frequency solvers to perform the numerical simulations, which also take considerations of the coupling effect between the four resonators within the unit cell. To minimize the time cost of the simulations, we lightly comprise the simulation accuracy to simulation speed. We tuned the Floquent mode to have one mode at both ports. The simulation mesh is tetrahedral, and we used a second-order solver with accuracy at 1e − 6. The resulting spectrum of the simulation has 2001 data points within the range of 100-500 THz. Table S1. Grid definition for the 14-dimensional input geometry parameters. h, p, and r are in units of microns. θ is in unit of degrees.
Step h p r x n /r y n θ n The total possible number of geometrical combinations of our 2×2 metasurface is 8 9 * 6 5 = 1.04 × 10 12 . It is impossible to use a conventional numerical simulation approach to exploit the entire geometry space to achieve the targeted spectrum. We find that the average simulation time per geometrical configuration per CPU is approximately three minutes. Thus it would take about 600 million years to finish exploring the entire geometry space with one CPU. The fast forward dictionary search (FFDS) inverse method was shown feasible for THz ADMs absorbers, where all 812 million possible geometries can be computed in a day [? ]. To compute our entire geometry space with a size of over a trillion parameters would take FFDS over three and a half years. Thus the NA method is a good choice when the parameter space become too large for a FFDS approach.

DEEP NEURAL NETWORK ARCHITECTURE
We built the entire network and neural-adjoint method using the PyTorch platform. The DNN used for the neural adjoint method consists of twelve fully connected linear layers, four 1D transpose convolutional layers for upsampling, and one final 1D convolutional layer for spectrum smoothing. The linear fully connected layers have the following structure[14, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 500 ], and all hidden layers except the last linear layer are batch normalized, activated by Leaky_Relu, followed by dropout layers with p = 0.05. The transpose convolutional layers have kernel size [16,16,33,33] and filter size [4,4,4,4]. The final convolutional layer has kernel size 1 and stride at 1. The data loader takes geometry inputs and the first 2000 data points of the absorptivity spectrum to generate the training and test datasets. The post-processing truncates the predicted spectra's first and last fifty points to drop the convolutional layers bump at the edges. We use L2 regularization, batch normalization, and the ADAM optimizer.

NA INVERSE METHOD
Because NA a gradient-based procedure, it will only converge to a locally optimal solution. Because the NA method is amenable to parallelization on graphics processing units (standard hardware for deep learning), this entire process can be computed in very little time. In our experiments, we can run the NA method with T = 1000 on our desktop computer with an Nvidia 2080ti GPU and complete processing in under 1 minute.Since the NA method finds the globally optimal solution even in its worst-performing cases, our results suggest that the NA always (or nearly always) finds the globally optimal solutions, even for highly complex problems like ours. Interestingly, this suggests that the main obstacle of custom design is no longer the inverse model, but rather the space over which we choose to search for designs i.e., the shapes we consider (cylinder, crosses, etc) and their parameter settings (e.g., radii, height). Although powerful, to use deep learning methods we must necessarily define a range of these settings so that we can collect simulations to train our models, and this space limits where we can search for designs. However, the design needed to realize our targeted scattering parameters may not exist in this initial search space. As we show subsequently, the NA method can also be used to identify where this initial search space can be expanded so that it is most likely to contain the desired solution, providing a solution to this emergent obstacle to complex material design.

DATA AUGMENTATION
Our simulations' unit cell boundary conditions allow us to do four times data augmentation on our dataset because the infinite plane of unit cells consists of four different resonators' combinations that give almost identical spectra with fluctuations from CST software. However, we learned that the DNN could quickly learn the correlation between four different resonators' combinations. The forward model will know which input geometries share the same spectrum in high fidelity if the entire dataset is augmented before splitting into the training and validation dataset. Therefore, the forward model will give a false mean square error much lower than actual loss performance on an independent validation set. After applying data augmentation to the training and validation sets after the splits, we observe that the four times augmentation did not significantly improve accuracy. We believe that, with our 60000 simulations (24000 simulations after augmentation), the augmented data points are still too sparse to cover the entire geometry space defined by our geometry boundaries.

GEOMETRY SPACE EXPLORATION THROUGH UMAP
We use Uniform Manifold Approximation and Projection (UMAP) to explore our solution geometry space and realize that angles have more random impacts on distributing the best NA solutions. Thus, we plotted the UMAP with ten parameters, excluding rotational angles. The plotted UMAP demonstrates a clear trend that the MSE is decreasing in one direction. To confirm that the decreasing trend matches the increasing of resonators' height, we further marked the points from maximum and minimum height boundaries, respectively. The clustering of points towards the best MSE performance suggests that the NA method is finding the best local minima.