Leveraging Continuous Material Averaging for Inverse Electromagnetic Design

Inverse electromagnetic design has emerged as a way of efficiently designing active and passive electromagnetic devices. This maturing strategy involves optimizing the shape or topology of a device in order to improve a figure of merit--a process which is typically performed using some form of steepest descent algorithm. Naturally, this requires that we compute the gradient of a figure of merit which describes device performance, potentially with respect to many design variables. In this paper, we introduce a new strategy based on smoothing abrupt material interfaces which enables us to efficiently compute these gradients with high accuracy irrespective of the resolution of the underlying simulation. This has advantages over previous approaches to shape and topology optimization in nanophotonics which are either prone to gradient errors or place important constraints on the shape of the device. As a demonstration of this new strategy, we optimize a non-adiabatic waveguide taper between a narrow and wide waveguide. This optimization leads to a non-intuitive design with a very low insertion loss of only 0.041 dB at 1550 nm.


Introduction
As integrated photonics continues to mature, we have witnessed a growing desire for more compact and efficient passive optical components. At the same time, our ability to satisfy these demands using either analytic methods or by tuning only a small set of device parameters is becoming increasing difficult. To combat this trend, more sophisticated optimization methods are proving very useful. In particular, topology and shape optimization enable us to efficiently design devices with hundreds, thousands, or even millions of independent design parameters. In the case of shape optimization, the boundaries between different materials composing an initial structure are modified in order to minimize a desired figure of merit (i.e. a function which quantifies the performance of the device being optimized) [1]. Similarly, in the case of topology optimization, a starting structure is modified in order to minimize a figure of merit, however fewer constraints are placed on how the structure can evolve and in particular its topology may be modified (e.g. the creation or elimination of holes) [2].
Both shape optimization and topology optimization have found extensive application in structural mechanics, and work on both methods dates back more than 30 years [1][2][3][4]. Despite the maturity of this field, comparitively limited application of these methods has found its way into the optics and photonics community. Early work on gradient-based microwave device optimization [5] demonstrated successful application of shape optimization to electromagnetics. This work was followed by the successful application of topology optimization to photonic crystals [6] and later to a greater variety of passive photonic components [7]. Within the last five years, however, the photonics community has witnessed a steady rise in interest in these optimization techniques as demonstrated by the numerous optimizations of efficient splitters, couplers, etc [8][9][10][11][12][13][14].
A large portion of this prior work has focused on implementing topology optimization techniques and has emphasized problems in which the design space consists of thousands or even millions of parameters. In particular, the idea of exploring the "full design space," that is choosing the permittivity at each point in a discretized domain as independent design variables, has gained popularity. While this approach has proven useful for generating solutions without requiring significant intuition about the appearance of the final structure, it suffers two primary disadvantages. First, in order to ensure that the device is fabricable, a final post-processing step is often required to convert a grayscale material distribution to a binary material distribution [15]. As a result, the final solution may not be truly optimal or may require additional costly optimization steps. Second, there are a variety of problems in which either the geometry of the structure is constrained in some way or for which topological changes are unnecessary (for example, a one dimensional grating coupler consisting of strictly rectangular segments). In these cases, topology optimization techniques are either unnecessary or even inappropriate.
Shape optimization serves as a remedy to both of these problems: it can be used in order to fine-tune the result of a topology optimization and is also well suited to handling problems in which the general shape of the structure is known beforehand. Shape optimization has an additional benefit in that it gives us considerable freedom over the choice of design space-we are free to choose arbitrary parameters, such as the length of a rectangle or the position of a circle, as independent design parameters. In previous work, shape optimization has frequently been used in conjunction with simulation methods that use an unstructured mesh (notably either a finite element method [1] or a variant of the finite difference time domain technique [5]). In the photonics community, however, the finite difference time domain, and by extension, the finite difference frequency domain (FDTD and FDFD respectively) are preferred due to their relative simplicity and computational efficiency. Unlike the finite element method, FDTD represents materials on a rectangular grid which is not conducive to the representation of non-rectangular material boundaries. This poses an even greater difficulty to shape optimization as the rectangular grid makes it difficult to represent continuous perturbations to material boundaries (which are inherently smaller than the grid spacing), a process which is essential to computing accurate gradients of the figure of merit. These gradients are helpful to the efficient minimization of the figure of merit, and it is thus desirable that we devise a way to achieve continuous perturbations to boundaries expressed on a rectangular grid.
To this end, we have implemented continuous boundary smoothing. The exact area overlap between grid cells which intersect with a material boundary and the material domain are used to compute an average "smoothed" permittivity within that grid cell. Infinitesimal modifications to the boundary of a material are reflected in infinitesimal changes to the effective permittivity of the intersecting grid cells. This enables us to calculate accurate gradients of a figure of merit with respect to many variables with only two simulations using the adjoint method. These gradients can then be fed into a variety of minimization algorithms in order to optimize our structure. This intuitive boundary smoothing method, which in the past has been considered primarily for the purpose of improving simulation accuracy, can enable us to easily optimize electromagnetic sturctures which would otherwise prove very challenging.
In this paper, we will briefly review the adjoint variable method which is used to compute the gradient of the figure of merit and highlight the importance of infinitesimal boundary perturbations. We will then present and validate our boundary smoothing method and finally use it to optimize an efficient short waveguide taper with a constrained feature size.

The Adjoint Method
Optimization (or "inverse design") of an electromagnetic device involves modifying an initial design such as to maximize (or minimize) a figure of merit which describes the device's performance. In order to modify the design, we specify design parameters which define the shapes of material boundaries and hence the structure of the device. There are many strategies for choosing the values of our design parameters in order to maximize our figure of merit; of particular interest are gradient-based minimization techniques which are very efficient assuming we can inexpensively compute the gradients of our figure of merit (i.e. the derivative of the figure of merit with respect to each design parameter). The most obvious way to do this is by a brute force approach in which each design parameter is independently varied and a separate simulation is run in order to determine how the figure of merit changes. If M design variables specify the shape of the device, then this method requires M + 1 simulations, which quickly becomes impractical for even modest values of M .
It turns out that through clever application of the chain rule, we can find the gradient of our figure of merit using only two simulations. This process is called the adjoint variable method and has found widespread application in many fields, including electromagnetics [16]. For the sake of clarity and notation, we will rederive the general expressions of the adjoint variable method and point out some key results.
Our goal is to minimize a figure of merit F (E, H) which depends on the electric (E) and magnetic (H) fields. The fields are found by solving Maxwell's equations, given by which we have chosen to write in their time harmonic form for simplicity. As is the case with any linear system of equations, we can discretize these equations on a rectangular grid and rewrite them in matrix form as where A is a matrix containing the curls expressed using finite differences and the permittivity and permeability at each point in the discretized space, x is a vector containing the electric and magnetic fields at all points in space, and b contains the electric and magnetic current sources at all points in space. In our discretized world, our figure of merit becomes a function of x which we write as F (x). If we had direct control over the electric and magnetic fields in x, finding the gradient of F (x) would be trivial. However, this is not the case. Instead, we have control over the permittivity and permeability defined everywhere in space which would usually be specified using a more structured set of design parameters (like widths of shapes, positions of shapes, coefficients of piecewise functions, etc). Thus if we write this set of design variables as p = {p 0 , p 1 , · · · , p M }, then the gradient we are interested in is the set of derivatives of F with respect to each p i , i.e.
To find these derivatives, we begin by applying chain rule when differentiating F (p). Consider first the i'th derivative for the simple case in which F contains no explicit dependence on the design variables: Because the fields in x are complex valued, we must be careful when taking their derivatives, hence the appearance of the 2 Re{· · · }. Notice that the derivative ∂F/∂x is already known since we expressed F as an explicit function of the electric and magnetic fields. The second term in (5), however, remains to be found. This is accomplished by directly differentiating our system of equations given in (3) with respect to p i and multiplying by A −1 , which yields Notice that in order to simulate a device in the first place, a relationship between A and p i is needed, so we assume ∂A/∂p i is known or assumed to at least be easily computable (an assumption that we will return to soon). Similarly, we assume that ∂b/∂p i is known, although in most cases this term will turn out to be zero. Substituting this result for ∂x/∂p i in Equation (5), we find an expression for the i'th derivative of F in terms of known quantities: This expression can be written in a more enlightening way by introducing a new vector given by where we have rearranged terms in the equation on the right hand side such that λ is the solution to an equation of the form Ax = b. Substituting this result into (7), we obtain a final expressive of the derivative of our function F with respect to the design variables of the system: In order to solve for all M derivatives of F , we need to compute the physical electric magnetic fields which are contained within x as well as a second set of non-physical "adjoint" fields contained in λ. Although λ is a consequence of mathematical operations, we can intuitively think about it as the fields that are produced by injecting the desired output fields (which arise from ∂F/∂x) into the system and running the whole system backwards. Solving for x and λ each correspond to a single "forward" and "adjoint" simulation, respectively, and thus solving for the gradient of F requires two simulations, independent of the number of design variables. Application of this equation is called the adjoint variable method.
In addition to the forward and adjoint simulations, an essential component of the adjoint variable method is the accurate calculation of ∂A/∂p i which contains information about how the distribution of materials in the system change with respect to modification of the design variables. This is contingent on our ability to form a continuous relationship between the design parameters of the system and to the explicit representation of the system of equations contained in A. In the case of representing material boundaries on a rectangular grid, this is non-trivial. To do this, we need a way to continuously map those boundaries onto the static rectangular grid.
We accomplish this using grid smoothing techniques, which allow us to project continuously defined boundaries onto a rectangular grid. If a boundary intersects a grid cell, an intermediate material value between the two values on either side of the boundary is assigned to the intersected cell. This allows us to make very small perturbations to the continuously-defined boundaries (i.e. perturbations smaller than the edge length of a grid cell) and achieve a correspondingly small perturbation to the materials defined in the rectangular grid. Making small perturbations to the structure allows us to compute ∂A/∂p i accurately and efficiently using simple finite differences. In the next section, we explain this grid smoothing process in detail.

Grid Smoothing
When trying to represent an abrupt interface between two materials on a rectangular grid, we inevitably run into the problem of staircasing: representing curved or diagonal boundaries between two different materials on a rectangular grid results in a jagged interface which conforms to the underlying grid. In addition to compromising simulation accuracy, staircasing poses significant challenges to calculating the gradients since perturbations smaller than a grid cell are not possible.
A considerable amount of work has been done to improve the treatment of non-rectangular boundaries with the FDTD method for the purpose of improving simulation accuracy. In particular, modification of the FDTD equations have been successfully employed in order to improve the simulation accuracy [17,18] and the introduction of effective permittivity at material interfaces [19][20][21][22] has been demonstrated to improve simulation accuracy in many situations. This process of computing effective intermediate material values, which we refer to as "grid smoothing," is particularly relevant to shape optimization as it provides us with a way to achieve small perturbations to the material boundaries represented on a rectangular grid.
In order to demonstrate this, we have implemented a simple form of grid smoothing using weighted averages. In a given grid cell, the effective permittivity in a 2D domain is given by where C k is the overlap area between the k'th material domain and the grid cell at location i, j and ∆x and ∆y are the grid cell width and height, respectively. For the purpose of computing derivatives, it is essential that C k be computable with high precision. We accomplish this by representing all boundaries in the system as piecewise linear functions (i.e. polygons) and then use a series of boolean intersections between the material domains and the grid-cells that intersect the boundaries of those domains. Because these piecewise linear functions are stored with very high (or even arbitrary) numerical precision, infinitesimal modifications to the boundaries are reflected by infinitesimal modifications to the local effective permittivity on the grid. The results of this process are depicted in Figure 1. In (a), the smoothed permittivity for a 0.5 cm diameter circle is shown on an intentionally coarse grid. As a result of the smoothing process, the grid cells at the boundary of the circle are filled with an effective permittivity whose value is between the permittivity inside of the circle and the permittivity surrounding the circle. We then test the continuous nature of this smoothing by displacing the circle in the y direction by 10 −12 cm. The change in the permittivity as a result of this displacement is shown in (b). The small size of the displacement is reflected by the correspondingly small change in permittivity in the grid cells at the circle's outer boundary. With continuous grid smoothing at our disposal, the derivatives ∂A/∂p i are easily computed using finite differences. One at a time, each design variable of the system p i is perturbed by a small amount (e.g. 10 −8 × ∆x) and a new matrix A(p i + ∆p) is computed. The derivative is then given approximately by which is accurate so long as ∆p is sufficiently small. Furthermore, the calculation of A incurs significantly less computational overhead compared to running a new simulation. Calculation of this derivative combined with solutions to the forward and adjoint problems provide us with the tools we need to optimize passive integrated photonic devices. To summarize, the process for optimizing a structure using grid smoothing and the adjoint method is as follows: 1. Choose initial structure, figure of merit F (x), and design parameters p

Optimization of a Non-adiabatic Waveguide Taper
In order to demonstrate the shape optimization process, we have optimized a non-adiabatic waveguide taper which is useful as a compact spot size converter for butt coupling to a fiber or short transition from a waveguide to a grating coupler. Unlike previous work on the design of short tapers [23], we allow the geometry of the taper to evolve with greater freedom. This allows us to achieve much more compact transitions than an analytic taper allows [23] without sacrificing performance over a large bandwidth. The taper we optimize is 18 µm long and connects a 500 nm wide input waveguide to a 9 µm wide output waveguide. The structure is made of 220 nm thick silicon clad in silicon dioxide and is reduced to two dimensions using the effective index method. The structure is excited with the fundamental TE mode of the 500 nm wide input waveguide using a wavelength of 1550 nm. We choose the linear taper depicted in Figure 2 (a) as the initial design. The taper itself is represented using a polygon with 200 vertices on its top edge and the structure is mirrored about the x axis using symmetry boundary conditions. We select the design variables of the problem to be the relative x and y displacement of these 200 vertices from their starting positions as depicted in Figure 2 (b). We thus use 400 design variables in total. Finally, we use a grid resolution of 25 nm in order to ensure that the simulation is sufficiently accurate. Our goal in optimizing this taper is to maximize the fraction of input power that is coupled into the fundamental TE mode of the wider output waveguide. The correct figure of merit which quantifies this is the mode matching efficiency integral. Assuming there is no reflected wave at the output, the continuous form of this mode matching integral, which we derive in Appendix A, is given by where E is the incident electric field, H m is the desired magnetic field, P src is the source power, and P m is the the power in the desired fields. The integral is taken over a plane which encompasses the entire desired field profile. In the case of the waveguide taper, H m corresponds to the magnetic fields of the fundamental TE mode of the the larger output Figure 2: A short 18 µm taper from a 500 nm silicon waveguide to a 9 µm wide silicon waveguide. The taper is defined as a single polygon containing 200 vertices along its top and bottom diagonal edges. We choose the displacement of the x and y coordinates of these points as the design parameters of the system. This taper is used to validate the accuracy of gradients computed using grid smoothing in conjunction to the adjoint method and serves as the starting point of an optimization to demonstrate application of these methods.
waveguide while E is the actual simulated electric field taken along the red line on the right hand side of Figure 2 (a) (labeled "FOM plane").
In this optimization, we not only seek a design that maximizes efficiency, but also a design that is fabricable. We accomplish this by applying radius of curvature constraints which prevent the formation of exceedingly small features. We do this by choosing the full figure of merit to be the mode match efficiency minus a penalty function which reduces the figure of merit when the approximate radius of curvature at each point in the polygon falls below a specified minimum radius of curvature. The full figure of merit is given by where, η is the mode match given in Equation (12) and f ROC (p) is an explicit function of the design variables that is positive and increases as the effective radius of curvature calculated at each vertex decreases. The details of this function are discussed in Appendix B. In this case, we choose a minimum radius of curvature of 150 nm since it is easily fabricable and significantly larger than the grid cells being used. Because our shape optimization gives us full freedom over the parameterization, the incorporation of such constraints is a straightforward matter of modifying the figure of merit and, unlike other topology optimization methods [14], requires no additional modification of the underlying minimization process.
Having defined a figure of merit and initial design, we are able to evaluate the accuracy of the gradients computed using the adjoint method with grid smoothing. We do this by computing the gradient of F (p) using Equation (9) and then comparing it to the brute-force calculation of the gradient. The error between the gradient calculated using the adjoint method and the gradient calculated using brute-force finite differences is determined by evaluating where the subscripts "FD" and "AM" refer to "finite difference" and "adjoint method," respectively. The result of this comparison is shown in Figure 3 in which gradient accuracy is plotted against the size of the perturbation in each design variable (∆p) that is used to compute ∂A/∂p i . Figure 3: Accuracy of the gradient of the figure of merit computed using the adjoint method and grid smoothing as a function of the size of the perturbation ∆p used in the calculation (normalized to the width of a grid cell ∆x). The error in this gradient is defined with respect to the "brute force" gradient which is computed by sequentially perturbing each design variable of the system by a very small amount and computing the new figure of merit. Greater error is incurred as grid cells move further from the original boundary of the shape contribute to the change in permittivity. In order to achieve better than ∼ 1% error, the step size used in the computation of the shape derivative ∂A/∂pi has to be less than two percent of the grid spacing.
When the perturbation to each design variable is sufficiently small (i.e. 10 −3 ∆x), the error in the gradient is much less than 1% and is effectively independent of the size of the perturbation. We attribute the non-zero minimum error to the presence of numerical error that arises during the forward and adjoint solution processes. As the perturbation is made larger, the number of grid cells which contribute to the estimation of ∂A/∂p i grows as the perturbed boundary intersects a greater and greater number of grid cells as compared to the unperturbed boundary. This in turn introduces additional error into the gradient calculation and is responsible for the nearly piecewise linear nature of the plotted gradient accuracy. This result highlights the importance of having a truly continuous grid smoothing method. Grid smoothing based on supersampling techniques would not likely be feasible since the permittivity in each grid cell would have to be sampled tens or hundreds of thousands of times in order to achieve a high level of accuracy, a process which would be exceedingly computationally expensive.
Based on these results we choose the perturbation size to be ∆p = 10 −5 ∆x in order to ensure that the computed gradients are as accurate as possible. These gradients are used with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, which we found worked best for this particular problem, in order to minimize the figure of merit and thus optimize the geometry of the waveguide taper. The real part of H z is plotted for the initial design in Figure 4. As is clearly visible, the wavefronts become curved as they propagate along the linear taper. This curving of the wavefronts results in significant coupling to higher order modes in the output waveguide, which is reflected by an initial efficiency of only 51%. Beginning with this structure, the optimization is run until the figure of merit changes by less than 10 −4 which takes 97 iterations of the minimization algorithm (which required 142 figure of merit and gradient calculations and thus 284 simulations in total). The structure resulting from this optimization process is shown in Figure 5. As desired, the wavefronts leaving the device are flat and there is no visible wave interference in the output waveguide that results from the excitation of higher order modes. This final structure has an efficiency of over 99% (-0.044 dB) at the design wavelength of 1550 nm.
The full extent of the improvement in performance due to optimization is made clearly evident by comparing the slices of the magnetic field (H z ) taken perpendicular to the output waveguide for the first and last iteration as shown in Figure 6. In the first iteration, the simulated field amplitude and phase deviate significantly from the desired profiles. In particular, the phase varies by approximately π radians across the width of the output waveguide, which reflects the significant curvature visible in the full two dimensional plot of the fields and is indicative of the excitation of higher order modes.
After the 100th iteration, on the other hand, the simulated magnetic field amplitude is almost indistinguishable from the desired amplitude profile. Similarly, within the output waveguide, the simulated phase matches the desired phase, deviating by less than one tenth of a radian from the desired flat phase. It is worth noting that while the phase deviates significantly outside of the waveguide, this is insignificant since the amplitude of the field is essentially zero in this region. The close match in amplitude and phase to the desire field reflects the final calculated efficiency of over 99% at 1550 nm.
In addition to the optimized design exhibiting extremely high efficiencies at the design wavelength, it also performs favorably over a wide range of wavelengths as demonstrated by Figure 7a. The optimized design exhibits a 3 dB bandwidth of approximately 420 nm. This wide bandwidth makes these tapers well suited for use with grating couplers whose bandwidth is generally less than ∼100 nm. Because the efficiency of the taper does not drop by more than roughly one third of a dB over a 100 nm range about the design wavelength, the combined bandwidth of the optimized taper plus a typical silicon grating coupler would not be significantly reduced compared to the bandwidth of the grating coupler alone.
It is interesting to note that our optimized waveguide taper exhibits a stronger sensitivity compared to the original linear taper. This is likely a result of the introduction of smaller wavelength-scale and sub-wavelength-scale features by the optimization. This result is not surprising, however, because we considered only a single wavelength when computing our figure of merit. We partially mitigate this issue by enforcing a radius of curvature constraint; by limiting the minimum size of features, we avoid the development of very fine structures which tend to exhibit a stronger wavelength dependence.
Despite the increase in sensitivity to wavelength, our optimized design still performs better than longer linear tapers over a limited bandwidth. For example, the optimized taper has a higher efficiency than a 50 µm linear taper over a 144 nm range as shown in Figure  7a. Similarly, compared to a 100 µm long linear taper, our optimized design has a higher efficiency over an 80 nm range of wavelengths. Thus in cases where a more modest bandwidth is needed, the optimized design can achieve higher efficiencies than can be achieved by linear tapers that are more than five times as long. In order to achieve equal performance at the design wavelength of 1550 nm, a linear taper with a length of over 180 µm (an order of magntiude longer than the optimized design) is required. The strength of the optimization methods we have employed not only lie in the performance of the final result, but also in the method's ability to rapidly converge to an optimal solution. Consider how the efficiency of the taper evolves with each iteration of the optimization plotted in Figure 7b. Within the first two iterations of the optimization alone, the figure of merit is increased by approximately 20% (from −3 dB to −1.5 dB). After the first 20 iterations, the figure of merit has further increased to over 90% (−0.46 dB), which is then followed by slower improvement as the gradients of the figure of merit become smaller and smaller. Convergence to the final efficiency is smooth as a result of the minimization algorithm used.
It is worth noting that the minimization algorithm we used is a local optimization method which makes no guarantees that a global optimum is found. In general, however, finding global optima would be ideal. Heuristic techniques like the genetic algorithm are often employed in electromagnetics in order to increase the likelihood of finding a global optimum. However, the genetic algorithm is greedy, often requiring thousands of simulations before an optimum is found [24,25], with no guarantees that a global optimum. The potential advantages are thus not always perfectly evident. In many problems, however, we can largely eliminate these concerns by taking advantage of our own physical intuition of the problem and "guessing" a good starting design. In the case of our waveguide taper, for example, we know that smooth linear transitions (which work well at longer lengths) are well behaved, exhibit low reflections, and suffer primarily from diffraction effects that cause the propagating wavefront to become curved. It is thus not surprising that beginning with the linear taper leads to a local optimum with such high efficiency and wide bandwidth.

Conclusion
This example demonstrates the great utility of shape optimization in electromagnetics: given a goal (i.e. figure of merit) and an initial guess for a design, we can quickly design passive components which outperform anything we could have otherwise designed by hand. This is made possible by grid smoothing which gives us substantial freedom in how we represent material boundaries and enables us to calculate accurate gradients of a figure of merit with respect to large numbers of design parameters when used with the adjoint method. With these accurate gradients at our disposal, we can employ a wide range of very powerful minimization algorithms. While in our example problem we used BFGS, other problems in electromagnetics may benefit from alternatives such as the limited-memory BFGS method or the truncated Newton algorithm. Many of these minimization methods can be used in place of BFGS for electromagnetic shape optimization without any modification to this work. Combining our physical intuition of electromagnetics with gradient-based shape optimization provides us with a path forward to improve many of the components in integrated optics. Based on the success of topology and shape optimization in other fields and the ease with which it can be applied to electromagnetics, we expect inverse electromagnetic design to become an essential component of the future optical engineer's toolbox.

A Mode Matching Integral
Mode overlap, or mode-matched efficiency, refers to the degree to which an incident field can couple into a desired mode of a system. Computing the mode overlap is essential to determining the efficiency of many optical devices. There appears to be a dearth of discussion about the details of coupling efficiency in the literature. In response to this, we present a relatively detailed derivation here.
Our first step determining the mode-matched efficiency is to express our input field as a sum of the allowed propagating modes of the system. Specifically, the basis will we will use consists of the electric and magnetic fields of both forward and backward traveling waves. We begin by writing the electric field as a sum of these basis functions, where the two terms in parentheses correspond to the forward and backward components and are given separately by In the expressions above, E m is the electric field profile of the mth mode of the system. The magnetic field, meanwhile, can be written in a similar form. Applying Faraday's Law and assuming harmonic time dependence of the electric field, the magnetic field can be written as an expansion of the forward and backward wave Notice in Equation (20), that the backward propagating term is preceded by a negative sign. This arises out of the requirement that power flow in the negative direction (and hence the Poynting vector, E m × H m point in the negative direction). In both Equation (20) and (16), we have chosen to write the fields as a sum of the forward and backward traveling wave. In general, given an arbitrary field, we will not know the forward and backward traveling components but only their sum. Our goal now is to develop the machinery needed to separate the different forward and backward traveling modes which composes an arbitrary field. This is equivalent to finding the coefficients a m and b m . To do so, we must take advantage of the orthogonality condition of our electromagnetic basis which arises as a result of Lorentz reciprocity and is given by [26] where δ mn is the Kronecker delta. We apply this orthogonality condition by computing the surface integral of E × H * m and rearranging terms yields an expression relating a m and b m to the electric field and mth mode where S m is related to the power propagating in the mth mode A second expression for a m and b m can be found by computing the surface integral of E m × H * which yields With two equations and two unknowns, we are now able to solve for the coefficients. Adding the two equations produces an expression for a m while subtracting them yields and expression for b m . Using these equations, decomposing a given field into the modes of the system is a straightforward calculation. Mode matching requires that we take this one step further and determine how much power is propagating in a desired mode. To find this, we compute the power propagating through a plane in the forward wave, i.e. P fwd = 1 2 Re{ dA · E fwd × H * fwd }. This calculation results in a sum whose terms correspond to the power propagating in each mode. This power is more conveniently expressed as a fraction of the total power propagating in the field: Equation (26) describes amount of power that can couple from an incident field into a desired mode of the system. This expression can be further simplified by noticing that in most problems, a backward traveling wave is not present at the output of the system. In this case, b m equals zero and as a result Taking this into account allows us to simplify Equation (26) to where E m and H m are the field profiles that we desire the grating to generate. This equation is the most general form of the mode-matched efficiency. However, in many applications, E m and H m will correspond to a guided mode. In this case, we can further simplify the expression by noting that for a guided more, the integral A dA · E m × H * m is real valued. In this case we can cancel the first term in the numerator and we find that the mode overlap is given by where we have chosen to write which describe the power in the incident field and the power in the desired mode (neither of which are guaranteed to be normalized to unity power). In many situations, we wish to know the total efficiency with which a device will output into a desired mode. This efficiency is differs slightly from the mode overlap expression given above as it compares the fraction of power coupled into a desired mode at the output of a device to the total power input to the device. This minor difference is easily accounted for by replacing P inc with P src , the total source power of the system.

B Radius of Curvature Constraint
It is relatively straightforward to impose radius of curvature constraints in optimization problems in which boundaries are defined using polygons. We do this by calculating an approximate curvature at each point in the polygon and then applying a smooth analytic thresholding function which reduces the figure of merit when the radius of curvature is below the chosen minimum value. This soft constraint does not guarantee that the approximate radius of curvature at every point in the polygon will be larger than the minimum value, however in practice the penalty can be made significant enough that small curvature will not appear.
In general, the curvature of a polygon is ill-defined since it is made up of straight segments. However, intuitively we know that with a sufficient number of points, a polygon can resemble a smooth curve. We should thus be able to define an effective radius of curvature that reflects the apparent smoothness. We can accomplish this by fitting higher order polynomials to set of points in the polygon and then calculating the curvature of those curves at each point. In this work, we fit quadratic parametric curves to sets of three points and compute the curvature at the middle point. Provided we have found paramtric curves x(t) and y(t) which are fit to the points (x i−1 , y i−1 ), (x i , y i ), and (x i+1 , y i+1 ), we approximate the curvature at (x i , y i ) using where x denotes the first derivative of x with respect to t and x denotes the second derivative of x with respect to t. With the radius of curvature now known, we can construct a penalty function which reduces the figure of merit when the radius of curvature falls below the minimum value. A good choice for this function is an analytic step function like the logistic function. This has the desired behavior of tending towards zero when the radius of curvature is larger than the minimum value and tends towards one when it is less than the minimum value. Furthermore, being analytic, the logistic function is differentiable which is essential for calculating gradients of the figure of merit. The full penalty function is a sum over logistic functions of the radius of curvature at each point, that is where R 0 is the minimum radius of curvature, κ is related to the sharpness in the transition of the logistic function from zero to one, α is a constant used to tune the strength of the penalty, and p is the set of design parameters. We emphasize that R i (p) can be a function of these design parameters (as is the case with the waveguide taper optimization presented above). If we choose both α and κ to be large (i.e. near one), then it is clear from Equation (31) that f ROC will quickly grow if the curvature at any of the points in the polygon drop below R 0 . Because this function is subtracted from the efficiency (which ranges from zero to one) in the full figure of merit in Equation (13), it is extremely disadvantageous for the optimization to progress in a direction which would reduce the curvature at any point below the minimum curvature.