Review of numerical optimization techniques for meta-device design [Invited]

: Optimization techniques have been indispensable for designing high-performance meta-devices targeted to a wide range of applications. In fact, today optimization is no longer an afterthought and is a fundamental tool for many optical and RF designers. Still, many devices presented in recent literature do not take advantage of optimization techniques. This paper seeks to address this by presenting both an introduction to and a review of several of the most popular techniques currently used for meta-device design. Additionally, emerging techniques like topology optimization and multi-objective optimization and their context to device design are thoroughly discussed. Moreover, attention is given to future directions in meta-device optimization such as surrogate-modeling and deep learning which have the potential to disrupt the fields of optical and radio frequency (RF) inverse-design. Finally, many design examples from the literature are presented and a flow-chart that provides guidance on how best to apply these optimization algorithms to a given problem is provided for the reader.


Introduction
The desire to maximize the performance achieved by electromagnetic (EM) devices [1,2] has long ago necessitated a need for optimization within the design process.Furthermore, rapid advancements in fabrication technologies, physical modeling, and available computing power over the past few decades have imbued EM designers with the power to accurately model and manufacture devices faster than ever.Moreover, these advances have also enabled modeling of more diverse and complicated problems than previously imaginable.It is noteworthy to mention that while simulation times may range from a few milliseconds for simple geometrical optics ray traces to a few seconds, minutes, hours, or even days for the most complicated and electromagnetically large full-wave simulations, all EM devices can benefit from optimization of some kind.
To this end, parameter sweeps are typically the first step employed by designers towards optimization.This is usually considered a "hand-tuning" procedure, but sometimes is unfortunately the only step taken in the optimization process.While parametric sweeps are often useful in establishing parameter bounds for optimization and revealing performance trends, they tend to be extremely inefficient if employed to optimize a design; especially, if the response surface (i.e., a surface that maps the relationships between input variables and output objectives) is hyper-dimensional.This is only exacerbated when multiple goals are considered, or non-linear constraints are applied on the input parameters.Essentially, humans have limited spatial reasoning capabilities that limit our ability to think hyper-dimensionally.
Fortunately, computers are very well suited to deal with hyper-dimensional mathematics and are the natural choice to solve these complex optimization problems.However, a computer's ability to find optimum solutions efficiently is limited ultimately by dimensionality and complexity of the governing problem and the power of the chosen optimization algorithm.Fortunately, many algorithms have been developed over the years for various applications.Historically, the first optimizations were based on local techniques, often exploiting function gradients to inform the next design chosen for evaluation.Algorithms such as Newton's method [3,4], gradient descent [5], and conjugate gradients [6] all use gradient information in different ways to aid in finding local minima.While these algorithms are very general purpose, some optimization techniques are highly-associated with a specific application.Take, for example, the damped least squares (DLS) algorithm, which has been around since the 1960's [7] and has been very popular for use in lens design for many years [2,8].This algorithm, while still considered a local technique, introduced a damping term that aids in convergence near the local minima and can assist in escaping from the many local minima that exist in typical lens design problems.However, today's optical engineers are afforded with many more degrees of design freedom than in decades past (e.g., high-order aspheric terms, free-form optics, gradient-index (GRIN) materials [9,10], and metasurfaces [11,12]) which has necessitated the investigation of more advanced optimization algorithms [13][14][15].Historically, optimization has long been of interest in the RF and antenna communities [1].Generally, RF and antenna problems contain fewer variables and have response surfaces with fewer local minima but tend to be much more computationally demanding than lens design problems (e.g.., using full-wave techniques as opposed to ray tracing).Therefore, like many optical design problems, RF and antenna problems often start with a known good solution and use that as a starting point for optimization.
However, good starting solutions are not always known, especially in the case of true inverse-design problems (i.e., problems in which an optimizer is tasked with finding a design that achieves a given set of performance criteria while obeying all design constraints [16]).Moreover, due to the computational cost of full-wave evaluations, finite-difference gradient calculations are often infeasible which can preclude the use of gradient-based optimization techniques.Therefore, for many problems, the ideal optimizer can routinely find the global minimum from a multimodal cost function with neither the aid of a good starting point nor gradient information and do so in as few full function evaluations as possible.To this end, global optimization techniques such as the Genetic Algorithm (GA) [17], Particle Swarm Optimization (PSO) [18], Differential Evolution (DE) [19,20], and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [21] have all seen considerable success in the optimization of RF and optical design problems.In fact, global optimization techniques have become so popular due to their ability to discover new and often unintuitive or even unexplainable solutions that competitions are held each year to test and find the most powerful, efficient, and robust algorithms [22].However, local optimization techniques should not be discounted.Recently, gradient descent based local techniques have been exploited by topology optimization in the inverse design of disruptive nanophotonic devices [23].Furthermore, both local and global optimization techniques have been extended to support true multi-objective optimization [24], which is a powerful emerging technique in meta-device design.However, sometimes local or global techniques alone are not enough to efficiently optimize a given problem.To this end, surrogate modeling and deep learning show tremendous potential to revolutionize the future of meta-device design by making optimization tractable for time-intensive function evaluations by replacing them with cheaper alternatives.
The organizational structure of this paper is as follows.The second section seeks to assist readers who may be inexperienced with the concepts and algorithms discussed in this manuscript by providing them a framework for understanding how to pair problem types with the appropriate optimization technique.The third section presents an overview of several global optimization algorithms as well as a discussion of designs resulting from their application.The following sections discuss the emerging techniques of topology optimization and then multi-objective optimization and in meta-device design.The final section introduces the reader to surrogate-modeling and deep learning techniques which can be exploited to accelerate optimization of computationally-expensive cost functions.Conclusions and closing remarks finish the paper and seek to reinforce the important role that optimization can play in meta-device design.

Getting started with meta-device optimization
While the no free lunches theorem states that any two optimization algorithms are essentially equivalent when averaged across all possible problems [25], this should not be interpreted as implying that the choice of algorithm is unimportant for a given optimization problem.In fact, we typically only encounter a finite number of problem types in meta-device design and can absolutely select preferential algorithms for the given problem type.For those inexperienced with advanced meta-device optimization, the hardest question to answer may be what optimization algorithm is best suited for a particular problem.The answer to that question depends on factors such as the input space topology, number of input parameters, number of objectives, and computational cost per function evaluation (i.e., full-wave simulation).Moreover, except in a few specific cases, there is no single optimization algorithm that is perfectly suited for a particular problem or numerical method (e.g., the finite element method (FEM) or finite-difference time domain (FDTD)).Still, it is possible to greatly narrow down the optimization strategy (i.e., local or global) and specific algorithm (e.g., GA or CMA-ES) recommended for a particular problem by asking a series of additional questions.
Figure 1 presents a meta-device optimization flowchart, which seeks to assist the reader in determining the best optimization strategy for their problem by answering these questions.Starting at the top of the chart, one flows down to the first question: Is the input space discrete?A discrete input space implies that there is a finite number of potential solutions to search from to find the optimal solution.This is known as a combinatorial optimization problem and while it is theoretically possible to evaluate all possible combinations to find the optimal design, this can often be intractable due to the number of parameter combinations and function evaluation cost.Fortunately, global optimizers like the GA (see Section 3) can efficiently find high performance designs from large solution spaces.However, in metamaterial design, the GA can often produce designs that have non-contiguous inclusions.Therefore, if a contiguous structure is needed (e.g., a meander-line antenna), the Ant Colony Optimization (ACO) algorithm or Multi-Objective Lazy Ant Colony Optimization (MOLACO) algorithm (see Section 3) is a better choice.
For problems with continuous input space representations (even if the parameters themselves are bound with constraints), the choice of optimizer is more complicated.If there exists a good initial solution, it is usually the best strategy to apply a local optimization technique.From there, if the problem can be cast as a finite element problem (i.e., one in which gradient information is used to directly modify the finite-element representation of the problem) then it is a prime candidate for topology optimization (see Section 4.1).Otherwise, there exist a number of gradient-based algorithms that can be used for optimization such as DLS, Newton's method [4], and multiobjective Gradient Descent algorithm (MDGA) [26].If a good starting solution does not exist, then global optimization techniques are usually the best choice (see Section 3).Flowing down from the global optimization bubble, one must determine how much time is acceptable for optimization.If the time required for a single function evaluation is short enough that evaluating hundreds or thousands of designs is within the acceptable limit determined by the designer, then a range of global optimizers may be directly applied to solve the problem.However, if the function evaluation time is large enough to prohibit direct optimization, then other techniques are needed.
Flowing  [33] ped over the ed [35].The G binary string zation process erve as parents from the paren is maintained t ined the popu s is repeated u entation the GA al solution from sively in the d material designs y the GA dem lgorithms ed for all omagnetic should be dized test lgorithms on (SOO) paradigm past few GA is an gs called s, the GA who will nt designs through a ulation is until some A is well m a finite design of s [37,38].monstrated broadband ab polarizations nanoantenna enhancement GA found p reflection pha of metamater include codin optimized for steering devic In additio proposed that [52] and bat umbrella of sw of decentraliz complex prob that exploits problems suc While the GA electromagnet bsorption in t over a wide fi array configu at a chosen lo ixelated reflec ase options wh rials in high-p ng metasurface r efficient pol ces (see Fig. 2(  have been ny (ABC) under the behavior optimize algorithm aph-based lem [56].pixelated igurations due to the presence of disconnected pixels in the solution.On the other hand, ACO maps the optimal "trail" found by the artificial "ants" in the graph topology to a contiguous structure.
For this reason and more, ACO has seen tremendous success in electromagnetic device design, especially in the generation of meander-line antennas [44] (see Fig. 3(a) bottom).Recently, Zhu extended the ACO algorithm to include "lazy" ants [43,57] which greatly improved design diversity and has successfully been used to generate high performance frequency-selective-surface (FSS) structures [43,58].Furthermore, by leveraging the third dimension (i.e., axial) these structure possess wide field of view (FOV) performance [58] which makes them an attractive candidate for metasurface design in the optical regime [59] (see Fig. 3(a) top).
While ACO and the GA are well suited to combinatorial (i.e., discrete) optimization problems, most optical and EM design problems are continuous functions and, thus require other optimization algorithms.The particle swarm optimization (PSO) was introduced in the mid-1990s [18] and has seen extensive application in electromagnetic device optimization [60][61][62][63].PSO is another swarm-intelligence optimization algorithm that was originally constructed to model social behavior and inspired by the movements observed in flocks of birds and schools of fish.When PSO was introduced to the electromagnetics community, it possessed a number of advantages over the GA.Firstly, PSO is a real-valued algorithm and operates with vectors of real numbers instead of binary values as with the GA (later versions of the GA introduced real-valued parameters [64]).Secondly, population members tend to operate more independently and cooperatively.This can be thought of as individual members exploring different parts of the solution space simultaneously and communicating to other members of the "swarm" when they have found a good solution.In fact, PSO was found to outperform the GA when designing negative index metamaterials [65].Additionally, PSO has seen application to optical meta-device optimization.In [45], PSO was employed to optimize the geometrical parameters and spacing of nanoparticle-based Yagi-Uda antennas (see Fig. 3(b)).PSO has also successfully been used to optimize nanohole-array based metasurfaces for beam steering applications [46] (see Fig. 3(c)).Interestingly, PSO has been found to be a special case of the more general wind-driven optimization (WDO) algorithm, which was later introduced in [66].
While global optimizers like the GA and PSO have successfully been applied to a wide range of design problems in the RF and optical regimes, they typically are sensitive to internal parameters that often require tuning on a per-problem basis.Moreover, it is not always clear how best to tune these parameters for a given problem and may require adaptive tuning during optimization to maximize the performance of the algorithm.In fact, many studies have investigated optimal control parameter tuning in evolutionary algorithms [67][68][69][70].On the other hand, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [21] requires very few user-defined control parameters; typically only the population size needs to be chosen prior to beginning an optimization.This self-adaptive nature and power of the underlying algorithm itself has made CMA-ES a very attractive choice for meta-device optimization [71].Furthermore, it has been shown through several comparisons that CMA-ES is a more capable optimization algorithm for the problems typically encountered in electromagnetics [72], allowing for the design of more complex and high performance devices due to its ability to optimize high dimensional problems in less time than other algorithms.In fact, in [47] CMA-ES was found to significantly outperform the GA in terms of convergence speed and quality of solution found (see Fig. 3(d) top) when applied to the optimization of broadband polarization-converting metasurfaces.Moreover, the optimal structure found by CMA-ES is simpler and potentially less prone to fabrication tolerances than the pixelated design found by the GA.CMA-ES, in conjunction with an efficient portreduction method, has been applied to electromagnetic band-gap (EBG) structure synthesis [48] (see Fig. 3(e)).CMA-ES has also been applied to the optimization of more traditional optical devices including triangular fiber Bragg gratings [73] and programmable optical filters for waveform sculpting [74].CMA-ES has proven to be a very powerful technique for homogeneous [75] and GRIN lens optimization [76,77].In [49], CMA-ES was used to optimize a GRIN lens which converted a Gaussian laser beam to a top-hat profile while maintaining collimation (see Fig. 3(f)).This design resulted in massive SWaP-reduction over traditional homogeneous lens-based beam shapers which require multiple elements to achieve the same behavior.Additionally, CMA-ES has also been used in non-electromagnetic metadevice optimization with examples including acoustic metamaterials [78] and thermal cloaks [79].
Global optimization techniques are the dominant approach to meta-device optimization today and we expect their use to increase, especially in optical metasurface and nanoantenna applications.Moreover, there is tremendous research activity targeted at developing new and more powerful GO algorithms which will continue to expand their applicability to metadevice design.Nevertheless, some of the most exciting nanophotonic devices today are optimized using local optimization techniques and a design methodology known as topology optimization.While global optimization techniques are very general, they suffer from the curse of dimensionality (i.e., the number of required function evaluations for convergence increases with the dimensionality of the problem, usually hyper-linearly).Topology optimization overcomes this limitation due to its unique cost function construction and although it is suited to a very specific class of problems, it has seen tremendous success in meta-device optimization.

Topology optimization
Topology optimization refers to the idea of optimizing a two-or three-dimensional system comprising an array of pixels or voxels (hereafter called elements), each which contains a discrete or continuous parameter requiring adjustment.Compared to many of the other approaches discussed in this paper, the number of simulations required for topology optimization does not increase as the number of elements in the system grows.As such, designs created using this method can be high-resolution, curvilinear structures containing thousands to millions of elements.
Topology optimization can be mathematically described as the maximizing of a target merit function using gradient descent.In particular, the gradients of the merit function with respect to the design variables provide a guide for iteratively modifying these design variables, in a manner that improves the merit function.Our design variable for a binary dielectric meta-device is the spatially dependent dielectric constant within the system, defined as: ( ) r  represents any location within the design domain, a ∈ [0,1], and the dielectric constants ϵ low and ϵ high represent the two materials making up the final device.The ability for the design variable to take grayscale values between these dielectric constant values is important in most implementations of topology optimization, as the method requires the iterative modifications to the design variable to be perturbative.Perturbative modifications can similarly be achieved by restricting the dielectric constant to binary values, and optimizing along the boundary of the geometry, only changing a small volume of material in each iteration.This is often useful as a refining step after greyscale optimization is complete.The gradient of the merit function for each element can be computed efficiently using the adjoint method [23].There exist a variety of adjoint-based topology optimization implementations.Many of these implementations make use of accurate Maxwell equation simulations to perform a set of forward and adjoint simulations per iteration [23], [80], [81].Consider, as an example, the problem where we want to use plane wave illumination to maximize the direct simulat element in the x 0 with an am design domain each element then calculate While top volume syste fundamentally convex, it is reason, it is o in order to en implemented is "objectiveaccuracy (ph constraints of multipliers (A problems.Wh modified to a has been use couplers to sil  fully 3D ion, more of devices and even enabling new functionality not possible with single layer patterns.In the case of multi-layer dielectric composites based on device planarization, multi-functional deflectors (Fig. 5(a)) and field-flatness corrected metalenses (Fig. 5(b)) have been theoretically proposed.Broadband light splitters operating in the scalar diffractive optics regime, based on varying and optimizing height topology with a low contrast polymer, have been designed and experimentally fabricated (Fig. 5(c)).Designs based on height topology variation have the potential for low cost, large area implementation based on imprint lithography.Diffractive optical components have also been proposed using 3D printing at length scales ranging from the nanoscale (Fig. 5(d)) to the microscale (Fig. 5(e)).We expect the range and capabilities of topology-optimized devices to continue to expand, as state-of-the-art fabrication methods continue to evolve and get better at producing 3D composite materials with high spatial resolution and greater dielectric contrast.

Multi-objective optimization
Although SOO and topology optimization are sufficient tools for many optimization problems, there are some classes of problems which cannot be satisfactorily solved by a single objective optimizer.In the case of a design problem with multiple competing goals, captured as multiple objective functions, it is unclear how these goals ought to be related to one another to produce an optimal solution.Furthermore, optimality in this context is no longer a straight-forward concept, as a host of different designs may fall on the tradeoff between the competing goals.One approach that is commonly employed is to combine multiple goals into a composite objective via a weighted sum which is then optimized with a traditional SOO.Unfortunately, this approach suffers from a few problems.First, the optimal choice of coefficients that the designer should use when creating the composite function is usually not known a priori.Additionally, and more critically, this approach will only yield a single solution despite there being a host of potential solutions that satisfy the best-possible tradeoff between the goals.In general, the tradeoff between the objectives measuring a set of designs is best understood by the concept of a Pareto front (Fig. 6(a)).This concept expands the notion of optimality from singling out the best of all designs to delivering a set of designs that achieve the best possible tradeoff between objectives [91].More specifically, the Pareto front of a design problem is the set of all designs for which an improvement in one objective necessitates a deterioration in some other objective.Because the "Pareto optimality" of a point with respect to other points ought not to favor one objective over others, the concept of dominance was introduced [92].For a pair of points x 1 and x 2 , x 1 is said to dominate x 2 if x 1 is better than x 2 in all considered objective measures.Using this concept, the Pareto front can be expressed formally as the set of feasible non-dominated designs in a given design problem.Interestingly, it has been shown that, depending on the structure of the Pareto front, there are some portions of it that cannot be found by using the weighted sum technique [92].Thus, true multi-objective optimization (MOO) must use a variety of other approaches to build a set of solutions which approximate the true Pareto front for a given problem.This both frees the engineer of the responsibility of prioritizing the objective functions ahead of time and can aid in understanding the physics which underly the tradeoffs between the problem goals.With the optimization completed, a common approach is to select a knee-point on the Pareto front (e.g., the closest point to the origin) which represents a compromise between the various objectives.More generally, now that a tradeoff is understood between the objectives, the engineer has an opportunity to apply any preexisting prioritization of the objectives which may be specific to their problem without fear of inadvertently missing solutions that are better for their situation but would not be found using SOO.
Because MOO inherently produces a set of solutions rather than a single solution, algorithm designers have had great success in adapting population-based evolutionary algorithms to operate using the dominance relationship.Thus, there are a wide array of multi-  bility has en applied the use of between pplied the SS that is the final ithin each ing of the n applied pixelized parameterization of dielectric nanoantenna scatterers in combination with a MOEA to characterize the tradeoff between the reflection at two different frequencies [97] (see Fig. 6(b)).Similarly, Nagar et al. used a MOEA called BORG [101] to simultaneously optimize the directivity, front to back ratio (FTBR), and scattering efficiency of both multilayer coreshell particles and Yagi-Uda nanoloop antennas [98] (see Fig. 6(d)).MOO has also been applied to photonic scatterer and waveguide design [99,102] (see Fig. 6(e)).Finally, Hassan et al. designed a nanoantenna with radiation modes dependent on the excitation port using a specialized MOPSO [103].Their optimization minimizes losses, maximizes radiation efficiency and maximizes discrimination between radiation patterns.
The problems of today demand high performance in a multitude of competing areas, especially when balancing electromagnetic performance with SWaP-C and manufacturability considerations.To this end, multi-objective optimization is perfectly suited to capture these and other arbitrary competing design objective tradeoffs.We expect MOO to be critical and its use to increase in meta-device design as more researchers become aware of its advantages.

Surrogate modeling
The practicality of any optimization technique is primarily dependent on the computational cost of the problem's function evaluations.Many of the optimization techniques covered thus far are based on solving Maxwell's equations in an iterative manner.While effective, they require considerable computational resources and time.For instance, to accommodate the complex physics required to evaluate Hassan's nanoantenna referred to above, the authors elected to perform full-wave simulations which accurately measure the objective functions of each design.However, as we know, full wave simulations can become prohibitively expensive for complex structures.To mitigate this issue, the authors of this design chose to integrate a powerful concept called surrogate modelling the optimization procedure.
Any high-quality meta-device solution must at some point be validated using trusted and robust simulation techniques.Although many designs are intentionally parameterized to avoid costly full-wave simulations, in some cases these expensive evaluations cannot be avoided.Since high-fidelity systems grow increasingly computationally expensive to evaluate in a fullwave solver, any intentions of optimizing the system may become impractical-in some cases to the point of intractability.For these kinds of problems, optimizers which make every effort to lower the expected number of high-fidelity (a.k.a.full) evaluations required to find an optimal solution are necessary.Unfortunately, many GO and MOO optimizers have no such constraints, and so may require too many full evaluations to be tractable on their own.Surrogate modeling (SM) techniques, on the other hand, strive to alleviate this problem by replacing full evaluations with trained models that are significantly faster to evaluate.These surrogate models can take many forms and fulfill different functions to lower the number of necessary full evaluations.Analytical models are one approach to accelerating optimization through surrogate modeling.For example, in the case of lens design, the lens maker's equation may be used as a surrogate model for constraining and seeding the optimization of optical systems [104].Equivalent circuit models used to describe antenna and metamaterial devices are another analytical surrogate model example and can be used to accelerate the optimization process of practical structures.For example, in [105], an RF circular split ring resonator metasurface was captured with a full-wave model, but the optimization work was offloaded primarily to a circuit equivalent allowing for significant time savings.Duan et al. [106] and Kim et al. [107] both similarly applied a circuit model to the optimization of nanoantenna devices (see Fig. 7(a) and Fig. 7(b), respectively).Because predefined analytical models are typically based directly on the physics of the ground truth high-fidelity model, they have the potential to offer the greatest speedup while remaining intuitive to understand and maintaining relatively high accuracy.Surrogate modeling can be further integrated into the optimization procedure, allowing for some truly remarkable speedups.In [116], the training of a Gaussian process model was interleaved with full-wave simulations of nano-particles of different morphologies to directly coordinate model training and prediction (see Fig. 7(c)).Surrogate models were integrated directly into the optimization algorithm to efficiently design photonic circuitry in [117] and [118].Co-Kriging is another surrogate modeling approach which uses multiple models of varying fidelity to reduce the number of full evaluations used [119].This technique was applied by Koziel et al. to an antenna design problem by correlating full-wave simulations of different mesh fidelities together for lower optimization cost [120].There also exist a number of emerging surrogate-assisted techniques such as inverse surrogate modeling [121], response feature based optimization [122,123], and adaptive response scaling [124] that have seen successful application at RF frequencies and have the potential to make an impact in optical meta-device optimization.
Surrogate modeling can also be used for applications beyond optimization time speedups.Easum et al. introduced MOTOL for multi-objective optimization with tolerance studies integrated through the use of surrogate models (see Fig. 7(f)) [27].In addition to enabling per-design measurement of design tolerance during optimization, MOTOL also simultaneously trains several competing surrogate models and dynamically selects the best one for the problem.MOTOL then explores the response surface using a Monte Carlo approach to estimate the design's tolerance hypervolume [27].Analytical techniques such as Interval Analysis (IA) have also been used for tolerance estimation [125][126][127] in RF device optimization.Finally, since tolerance is an explicit objective, one can observe the tradeoffs between design robustness and traditional performance objectives such as gain, bandwidth, and field-of-view.
Finally, the design problems of today will almost certainly become more difficult as engineers seek to incorporate multiphysics and multiscale aspects into the inverse-design process.Undoubtedly, this will challenge the available computational resources, and so surrogate modeling is well positioned to play a critical role in realizing disruptive metadevices in the future.

Deep learning
An emerging class of surrogate modeling techniques involve the use of deep neural networks (DNN).As with other learn-by-example techniques, the general idea with DNN's is to expend computational time and resources upfront, for the generation of training data sets consisting of device geometries and their associated optical responses.These data can be used to train a deep neural network, using classical supervised learning methods, to 'learn' the nonlinear relationships between geometry and optical response.The power of deep neural networks comes from their multi-layered composition which allows them to learn the relationships between data with multiple levels of abstraction.Once trained, a deep neural network can efficiently produce the geometry of a device when presented with a desired optical response.Deep learning techniques based on DNNs have led to tremendous advancements in image processing, object detection, and speech recognition [128] and have the potential for tremendous disruption in the inverse design of RF and optical meta-devices.
Deep neural networks that use device geometries and optical responses as input and output parameters are able to perform inverse design with nanophotonic systems.In a recent demonstration, deep networks were used to relate the geometry of subwavelength-scale concentric dielectric shells and their scattering cross sections (see Fig. 8(a)) [129].The thicknesses of the differing shells served as discrete parameters describing the scatterer geometries, while sampled points in the scattering spectra served as discrete parameters describing the optical response.These input and output parameters map onto a discrete set of input and outp neurons.Upo shell thicknes to capture the A similar typ profile of a di this case, the radial coordin More com systems, in w such network and the outpu response for a loss converg demonstration desired optica predict the ch (see Fig. 8(d)) The appli incipient stag shapes.One o architectures as convolutio layers can be quantities of t efficient hard neural networ meta-device i  of two split rings (reprinted (adapted) with permission from [132].Copyright (2018) American Chemical Society).

Conclusions
Many algorithms and techniques exist for the inverse design of meta-devices and due to the ever-increasing levels of computational power, advancements in fabrication techniques, and interesting materials available to the designer there will only be an ever-increasing need for optimization to realize the highest performance designs.Whether it be a local, global, singleor multi-objective algorithm, optimization can benefit all optical, RF, and nanophotonic design problems.However, readers should not conclude that optimization can wholly replace the need for experienced designers.Rather, optimization should be thought of as a tool that designers can use to maximize the performance of their device.Moreover, experienced designers can use their prior knowledge and intuition to significantly reduce the computational costs of optimization by applying intelligent constraints and supplying exceptional starting points.Finally, the authors strongly advocate that all readers consider applying some level of optimization to their problems and hope that the discussions provided in this manuscript are helpful to experienced and novice designers alike.
Fig. electro and (b An e (Repr steerin conve the (b Electr optim beam permi

Fig. 2 (
d)), and (a) (Top) examp with permissions h permissions from y for super-direc (c) A nano-hole from [46], OSA) roadband operation permission from ombined port-redu 48], IEEE).(f) Ga ens systems (repri 1] optimization uch as the artif s.These algor seek to exploit spired, systems 55] is a swarm nd optimal so the traveling s eration of high often limited t for both TE used to optimiz er to maxim Conversely, in d a series of g factor in the applications of n [40-42], met d phase-gradie ple meander-line from [43], IEEE) m [44], IEEE).(b) ctive applications array-based beam ).(d) Polarizationn and compared to [47], OSA).(e) uction and global aussian to top-hat inted/ Fig. 4 adjoin simula design from OSA) gratin Amer experi Copyr of sp wavel image 2017

6 .
Fig. 6 the tra efficie (reprin design stop b Measu (right) nanoefficie Their (Desig based genera generaAs MOO become appar to the design MOPSO for reflection coe MOLACO al polarization i system to be f unit cell of th frequency sel with consumm (a) Plasmonic bow t models (reprinte Chemical Societ ng equivalent-circ .0[115]).(c) Exa ze field enhance [116].Copyrigh ormation and resu op) and (bottom) n employing full q m [111]).(e) (Top n surrogate mode rrogate model (gre ], OSA).(f) (T ate model with th ssible tolerance hy is validated with a o-set of optimal de ficult or impra function appro le (LBE) [108] to evaluate.E upport vector m et al. trained a gorithm to rep nts using CMA 11] (see Fig. 7( on response of developed an a citly character ups compared ee Fig. 7(e)).S gn (SBD) opt verse-design gi al. employed ynthesis of qT w-tie nanoantenna ed (adapted) with ty).(b) Huygens cuit model (right) ample nanoparticle ement at various ht 2016 American ulting index map a comparison of qTO and a Kriging p) Sampled lenses el and (bottom) a een) and ray tracer Top-Left) A two-gain and the estimated design tolerance (reprinted (adapted) with permission from [27], IEEE).

flowc combi 3. Global op
Multilayer genetica -of-view.(Left) U icated.(Right) Me Copyright 2014 Am mized with the G st periodic-and d Copyright 2012 Am field enhancemen from [31].Copyr encoding (top) and sion from [32]).e-gradient metasu ft).(Reprinted (ad ally-optimized unit Unit cell geometry easured absorption merican Chemical GA (right) and a dimer-based array merican Chemical nt in the vicinity of right 2012 by the d (bottom) various (e) Geneticallyurface (middle) to dapted) from