Differentiable Scattering Matrix for Optimization of Photonic Structures

The scattering matrix, which quantifies the optical reflection and transmission of a photonic structure, is pivotal for understanding the performance of the structure. In many photonic design tasks, it is also desired to know how the structure's optical performance changes with respect to design parameters, that is, the scattering matrix's derivatives (or gradient). Here we address this need. We present a new algorithm for computing scattering matrix derivatives accurately and robustly. In particular, we focus on the computation in semi-analytical methods (such as rigorous coupled-wave analysis). To compute the scattering matrix of a structure, these methods must solve an eigen-decomposition problem. However, when it comes to computing scattering matrix derivatives, differentiating the eigen-decomposition poses significant numerical difficulties. We show that the differentiation of the eigen-decomposition problem can be completely sidestepped, and thereby propose a robust algorithm. To demonstrate its efficacy, we use our algorithm to optimize metasurface structures and reach various optical design goals.


introduction
The scattering matrix is a fundamental concept in many fields.It relates the input state and the output state of a physical system undergoing a scattering process.Particularly revealing in optics, the scattering matrix has been widely used for analyzing photonic structures such as waveguides [9,18,19] and metasurface units [8,16,43].Once the scattering matrix of a photonic structure is known, the structure's optical performance (e.g., mode conversion efficiency and phase shift) can be directly obtained.
Because of its vital importance, many numerical methods have been developed to compute the scattering matrix of a photonic structure.Among them, a popular class is the semi-analytical methods, such as the method of lines [28] and rigorous coupled-wave analysis (RCWA) [24].These methods exploit the fact that many photonic structures in practice (such as waveguides and metasurface units) have a piecewise constant cross-sectional shape along the transmission direction (denoted as z-direction).Thus, to solve Maxwell's equations, they only need to discretize the 2D cross-sectional region, reducing Maxwell's equations into a set of continuous differential equations along z-direction, whose solution can be expressed through an eigenvalue analysis.Thanks to the semidiscretization, these methods often enable faster computation in comparison to full discretization methods (such as finite-element-and finite-volume-based methods).Indeed, methods like RCWA have been widely used in designing various photonic structures, such as metasurfaces [3,4,11,16], metagratings [1,13], holograms [43,44], polarimeters [5], solar cells [40], radiative cooling structures [30], color structures [34], photonic crystals [41], and waveguides [9,19].
In this work, we extend the semi-analytical methods to obtain the higher-order information of scattering matrices, namely the scattering matrix's derivatives (or gradient).Provided a photonic structure specified by certain design parameters, we aim to compute not only its scattering matrix but its derivatives with respect to the design parameters.
The scattering matrix derivatives depict the changes of the structure's optical behaviors as its design parameters vary.This higher-order information, if robustly and efficiently computed, finds many applications in photonic design.Most notable is the optimization of photonic structures.The derivatives provide guidance on how we can adjust the parameters (e.g., through the gradient descent algorithm [31]) to improve the structure's optical performance [20] or to find a design robust to fabrication error [2,39,45].
Unfortunately, the computation of scattering matrix derivatives is nontrivial.The difficulty is rooted in the fact that the permissible optical modes in a photonic structure are eigenfunctions of a linear (Hermitian) operator determined by Maxwell's equations [15].Thus, to compute the scattering matrix, semi-analytical methods must solve an eigen-decomposition problem: its eigenvalues describe the propagation constants (or effective indices) of the modes and its eigenvectors indicate propagating modal patterns.Differentiating the scattering matrix, by chain rule, requires the derivatives of eigenvalues and eigenvectors.It is the need of eigenvector derivatives that renders the scattering matrix differentiation ill-posed: when there exist repeated eigenvalues, the corresponding eigenvectors are not uniquely defined.As the parameter changes, the numerical results of the eigenvectors may change discontinuously, and their derivatives become undefined (see more discussion in Section 3).
Not merely does this issue exist as a corner case; many photonic structures in practice have geometric and material symmetries, from which repeated eigenvalues and thus ill-defined eigenvector derivatives emerge (see Fig. 2).Consequently, one must carefully choose eigenvectors such that they vary smoothly with respect to the design parameters.This choice, albeit attainable, demands complex and expensive computational effort [38].
In this paper, we question the necessity of eigenvector derivatives for differen- In a photonic structure, light may be incident from both sides and get scattered out.The relationship of incident and scattered light is characterized by the scattering matrix.(b) A complex structure can be decomposed into individual layers.Each layer is characterized by its scattering matrix, and these scattering matrices are then combined (using the Redheffer star product [29]) to form the scattering matrix of the entire structure.
tiating scattering matrices.We show that while eigen-decompositions are needed for computing a photonic structure's scattering matrix, eigenvector derivatives can be fully sidestepped for differentiating the scattering matrix.Based on our new derivation, we present a fast and robust algorithm that, without resorting to eigenvector derivatives, computes the scattering matrix derivatives with respect to any design parameters.Our method is designed for scattering matrices in general, independent from any specific basis representation; nor is it bound to any particular geometric parameterization.To demonstrate the use of our method, we apply the scattering matrix derivatives for optimizing the design of metasurface units.We can choose different design parameterizations and use gradient-based optimization to reach various light transmission goals.We also propose a general parameterization of the meta-unit's cross-sectional shape that can be optimized using our method.

Background: Scattering Matrix
We start by briefly reviewing the classic notion of scattering matrix in computational photonics, to pave the way toward its differentiation.
To numerically analyze a photonic structure (such as a waveguide), the structure is often discretized along the wave propagation direction (i.e., zdirection) into a series of layers each with a uniform cross-sectional material distribution (Fig. 1-b).Consider optical waves of a specific frequency.Their propagation in each layer is characterized by a scattering matrix S, which relates waves incident on the layer from left and right sides (Fig. 1-a) to the waves scattered out in either direction.
Concretely, let a L and a R denote vectors describing incident waves on the layer from left and right sides, respectively.Here a L and a R stack coefficients that represent the waves under a chosen basis, whose construction will be outlined shortly.Under the same basis, we use b L and b R to denote the scattered waves in left and right directions.With these notations, the incident and scattered whose cross-sectional shape (a) is parameterized by α, which controls the size of the cross-shaped hollow region.The shape symmetry causes the structure's propagating modes to have repeated effective indices (c), as also indicated by the repeated eigenvalues when one solves (2).As α varies, mode 0 and mode 1 always have the same effective index, meaning that their first-order and higher-order derivatives of the corresponding eigenvalues with respect to α are always the same, and thus their eigenvector derivatives are not mathematically well-defined.The same issue occurs in other modes when α becomes small and more propagating modes appear (e.g., see mode 5 and mode 6 when α is within ∼ [0.3, 0.5]).
waves are related through Here S is decomposed into four submatrices: R L and R R indicate how the incident wave from left or right direction is reflected by the layer, while T RL and T LR describe how the incident wave (from either direction) transmits through the layer.
The computation of scattering matrix starts with a semi-discretization of the frequency-domain Maxwells equations of a photonic layer, namely, where k 0 is the free-space wave number, and the vectors e and h describe the electric and magnetic fields of the photonic structure under a chosen basisfor example, RCWA uses the 2D Fourier basis on the cross-section of the wave propagation direction.The matrices P and Q encode the cross-sectional distributions of material permeability and permittivity.The semi-discretization (2) is a common form in many numerical analysis methods for photonic structures (such as the method of line [28] and RCWA [24]).
The difference across those methods only lies in the specific ways of constructing P and Q (e.g., see Supplement 1 for their construction in RCWA).
Once P and Q are determined, the scattering matrix S can be constructed.A key step of this construction is to solve an eigenvalue problem, (PQ)W = WΓ, to obtain eigenvectors W and the diagonal eigenvalue matrix Γ.As we will discuss in Section 3, it is this eigenproblem that renders the differentiation of the scattering matrix ill-posed.To understand the challenges and how we overcome them, we first present the recipe of computing S from W and Γ, as follows.
Let Ω := (PQ) . Then, its eigenvalue matrix is Λ = Γ 1 2 .As derived in [32], the formulas of computing the scattering matrix S defined in (1) are where the matrices X, A, and B have the following forms: Here we use L to denote the layer thickness (Fig. 1), and the matrix V is related to W through V = QWΛ −1 .W and V together form a basis of electric and magnetic components for the optical waves in the layer.Similarly, W 0 and V 0 form a basis for free space propagation, independent from the photonic structure.
They are constant values for computing the derivatives of S. The vectors, a L , a R b L , and b R , in (1) are coefficients under this free-space basis to describe incident and scattered waves.
Once the scattering matrices of individual layers are computed, they are combined using the Redheffer star product [29] into the total scattering matrix, one that indicates the optical response of the entire photonic structure.
Remark.The formulas in Eqs. ( 3) and ( 4) assume that the current photonic layer is sandwiched by two free-space layers.This assumption is by no means a limitation.In an arbitrary photonic structure, the layers can be treated as if they are interleaved with free-space layers-each of which has a zero thickness.

Differentiable Scattering Matrix
The geometry or material distribution of photonic structure is specified by its structural (design) parameters (e.g., see Fig. 2).These parameters determine the structure's permittivity and permeability distributions described by P and Q in (2).Thus, one can compute their derivatives, P and Q , with respect to an arbitrary parameter.Given P and Q , we now address the question of how to compute the scattering matrix derivative S with respect to the same parameter.

Challenges in Scattering Matrix Differentiation
The construction of scattering matrix S needs to solve an eigenvalue problem (PQ)W = WΓ, as the eigenvalues Γ and eigenvectors W appear in Eqs. ( 3) and ( 4) for computing S. Thus, for the differentiation of S, it seems also needed to compute the derivatives of the eigenvalues Γ and eigenvectors W.
Unfortunately, the derivatives of eigenvectors in many cases are ill-defined.Most notable is when there exist repeated eigenvalues.Repeated eigenvalues are not uncommon: many photonic devices have certain structural symmetries, from which eigenvalue repetition naturally emerges (see Fig. 2).For those repeated eigenvalues, their eigenvectors (up to a scale) are not uniquely determined; any set of linearly independent vectors that span the same subspace are valid eigenvectors.Because of the ambiguity, as the structural parameter changes, those eigenvectors may change discontinuously (see an examples in Supplement 1), and thus their derivatives may not be well-defined.
As a result, one must carefully choose eigenvectors in the subspace of repeated eigenvalues such that the eigenvectors change continuously with respect to the structural parameter.This choice, however, is computationally expensive.As derived in [38], to ensure well-defined eigenvector derivatives, one must compute higher-order derivatives of the eigenvalues and the matrix PQ: if the repeated eigenvalues have repeated derivatives up to the n-th order (see Fig. 2), then derivatives up to the (n + 1)-th order of both eigenvalues and the matrix PQ must be computed to determine first-order eigenvector derivatives.

Differentiation without Resort to Eigenvector Derivatives
We now present a new algorithm for computing the scattering matrix derivative S .Even in the presence of repeated eigenvalues and their derivatives, our method requires only the first-order derivatives of the matrices P and Q, completely sidestepping the differentiation of eigenvalues and eigenvectors.In comparison to the way that takes eigenvalue derivatives (as described above), our method is more robust and efficient.First, we rewrite the commonly used expressions of scattering matrix components, shown in (3), in new forms, where D 1 , D 2 , and D 3 denote the following matrix multiplications, respectively: The derivation of these new expressions ( 5) and ( 6) are provided in Supplement 1.
In Eqs. ( 6), the equalities are reached by applying (4) and using T that denotes The expressions in (5) present a new route for computing scattering matrix derivative.They indicate that the scattering matrix S is determined by the three matrices, D 1 , D 2 , and D 3 .As a result, to compute its derivative S using the chain rule, we need to compute the derivatives of D 1 , D 2 , and D 3 with respect to the structural parameter.
In Eqs. ( 6), both W 0 and V 0 (introduced in ( 4)) are constant matrices.Thus, the derivatives of D 1 , D 2 , and D 3 only depend on the derivatives of two other matrices in Eqs. ( 6), namely T and WXW −1 .We now describe how to compute the derivatives of the two matrices, respectively.

Derivative of T. The matrix
but not the eigenvalues and eigenvectors.Its derivative can be expressed as where Q depends on the material permittivity distributions of the photonic structure, and therefore its derivative Q with respect to a design parameter can be directly computed (see examples in Section 4).The way of computing Ω can be derived by taking the derivatives on both sides of the relation Given P and Q , the right-hand side of this equation can be directly computed.
To compute Ω , we rewrite the left-hand side by denoting Ω as Ω = WYW −1 for some unknown Y, where W is the eigenvector matrix of PQ.Using the fact that Ω = (PQ) 1/2 = WΛW −1 , we obtain a simplified form of (8): From ( 9), Y can be easily solved by noticing that Λ is a diagonal matrix, and therefore (9) can be written element-wise as (λ i + λ j )Y ij = C ij , where λ i is the i-th eigenvalue in Λ, and C denote the matrix on the right-hand side of (9).In other words, the elements of Y can be obtained by solving We note that while this process of computing Ω requires the eigenvectors W and eigenvalues Λ, they are also needed for computing the scattering matrix in the first place.Our solving process does not require the derivatives of eigenvectors.Therefore, it introduces no additional effort in terms of eigen-decomposition.
Derivative of WXW −1 .In the first glance, the derivative of WXW −1 depends on the eigenvectors W.However, from the definition of X in (4a), we notice that WXW −1 = e jΩL/k0 , which suggests an alternative approach: take the derivative of the matrix exponential e jΩL/k0 with respect to Ω.
A common approach of computing the matrix exponential e jΩL/k0 is through the eigen-decomposition of Ω followed by the exponential of the resulting eigenvalues.If we take this approach, the derivative computation must involve the derivatives of eigenvectors, which might not be well-defined.Another approach, used by Feynman [12] and others [7,17,35], expresses the derivative of a matrix exponential using an integral that in itself involves matrix exponentials.Yet, numerically evaluating the matrix exponentials and the integral are expensive.
Instead, our proposed method for computing the derivative is based on the following proposition originally proved in [25].
Proposition 1.Consider an n × n matrix Ω and its derivative Ω with respect to an arbitrary parameter.If , then e jGL/k0 = e jΩL/k0 e jΩL/k0 0 e jΩL/k0 , where the top-right n × n block matrix in e jGL/k0 is the derivative of the matrix exponential e jΩL/k0 .
In our problem, Ω is computed as described above (by solving ( 9)), and the common way of computing e jGL/k0 is by taking the eigen-decomposition of G, which is again what we wish to avoid.We therefore take a different approach, the scaling and squaring method [14], to compute e jGL/k0 -without the need of eigen-decomposition.
The scaling and squaring method exploits the relation e A = e A/σ σ for any n × n matrix A. In practice, σ is chosen to be σ = 2 s for some non-negative integer s.The idea is to have the norm of A/σ sufficiently small such that e A/σ can be well approximated by a Padé approximant near the origin.The Padé approximant is a rational polynomial of A. Its evaluation requires only matrix multiplications and inverse, but no eigen-decomposition.The scaling and squaring method is robust and accurate, and has been used in many numerical tools (such as MATLAB's expm function).
When applying this method, we further exploit the specific structure of G (i.e., its bottom-left block matrix vanishes, and its two diagonal block matrices are identical) to tailor the method for improving computational performance.Supplement 1 presents our detailed derivations and computational steps.

Results
This section presents our numerical results.First, we validate our algorithm for computing scattering matrix derivatives.Next, to demonstrate the use of scattering matrix derivatives in photonics, we optimize the geometry of photonic metasurface units (also called meta-atoms).
Meta-atoms are the building blocks of a metasurface, often designed based on physical intuitions and manually crafted libraries [26,27,42].More recently, inverse design methods of meta-atom structures have also been explored-  We use our method and FDTD (Lumerical [36]) to analyze a 3D meta-atom.The width of the pillar and its square hole are 0.6µm and 0.2µm, respectively, and it has a height of 1.4 µm.We use the periodic boundary condition, with the period of 0.66µm.(b) We scan the (x-polarized) wavelength and plot the effective indices of the fundamental mode evaluated by FDFD and our method.(c) For each wavelength (color mapped here), we compute the far-field amplitudes and phases changes, and compare our results to FDTD.e.g., through finite-difference-based gradient descent [6], adjoint-based level-set method [23], and topological optimization [22,33].
Due to fabrication constraints, meta-atoms often have constant cross-sectional shapes along one direction (i.e., z-direction, as shown in Fig. 3-a).Thus, the semi-analytical methods (such as RCWA) are particularly efficient for simulating meta-atoms, thanks to their ability of not discretizing along z-direction [24].Our method, for the first time, enables the semi-analytical methods to also compute scattering matrix derivatives with respect to design parameters.Here, in the framework of RCWA, we demonstrate automatic discovery of meta-atom structures that reach various amplitude and phase goals.

Validation
To validate our algorithm, we consider a dielectric meta-atom used in metasurface holography [26,27].Its structure is shown in Fig. 3-a.We use Eqs.(5) to compute the scattering matrix, for which the matrices P and Q (introduced in (2)) are constructed using RCWA.The scattering matrix allows us to compute light propagation properties of the meta-atom, which are in turn compared to the results from finite difference time domain (FDTD) method implemented in Lumerical [36].
We first scan the light wavelength from 1.2µm to 1.6µm.For each wavelength, we compute, using our scattering matrix and FDFD respectively, the effective index of the fundamental mode propagating in the meta-atom.The results from our method agree with FDFD results (see Fig. 3-b).Furthermore, we consider  We choose two matrix elements in the scattering matrix, and plot their FD derivatives estimated with different FD sizes (∆α in x-axis) in (a) and (b), respectively.In each plot, the red and blue solid curves correspond to the real and imaginary parts of the estimated derivative.Meanwhile, the derivatives computed by our method are indicated by the red (real) and blue (imaginary) horizontal dash lines.These plots show that FD estimation is highly sensitive to ∆α.Light green regions indicate valid ∆α ranges for both matrix elements.The valid ∆α varies element by element.Thus, in practice, it is hard to choose a proper ∆α for the entire scattering matrix, whereas our method is always robust.
the far-field light transmission through the meta-atom, and compute the phase shift and amplitude change for each wavelength.Again, the results from our method and FDTD match closely, as shown in Fig. 3-c.These experiments confirm that our scattering matrix computation is as accurate as FDTD in Lumerical.In terms of computational cost, our method takes about 0.15 seconds for each monochromatic simulation, and a few seconds for the entire 1.2µm-1.6µmwavelength range, whereas the FDTD simulation takes several minutes.
Next, we validate our derivative computation.We consider again the metaatom structure shown in Fig. 3-a, and choose the parameter α to be the size of the hollow square.Using our method, we compute the derivative of the structure's scattering matrix with respect to α.Meanwhile, since there is no analytic expression of the scattering matrix derivative, we approximate it using finite difference (FD) estimation, that is, We estimate ∂S ∂α using a sweeping range of ∆α values, and compare them to the derivative resulted from our method.
The results are illustrated in Fig. 4. The accuracy of FD approximation largely depends on the choice of ∆α.Only when ∆α is chosen within a certain range, FD approximation is accurate enough to agree with our derivative results.This agreement confirms the correctness of our method.But for different elements  As a reference, we also show the amplitudes and phases (in circular dots) of the designs that globally minimize (12), that is, ones obtained through a slow, exhaustive search of all parameter combinations.While no gradient-based optimization algorithm can guarantee the global minimum of ( 12), our results approach the targets closely, comparable to what the global minimums can achieve.The resulting cross-sectional shape for each sampled target are shown in (c).
in the scattering matrix, the valid ∆α range varies (indicated in light green in Fig. 4), suggesting that FD approximation is impractical: it is hard, if not impossible, to choose a proper ∆α to produce accurate derivative estimations for all elements in the scattering matrix.In contrast, our method is robust for computing the derivatives.
Computational cost.In addition to the robustness, our method is also faster than the FD method.In the FD method, computing a matrix derivative requires the computation of two scattering matrices S(α + ∆α) and S(α − ∆α).In contrast, our method, in addition to computing S(α), only requires a few matrix multiplications and inverses (recall Section 3.3.2).On our workstation computer, the overhead of computing a scattering matrix derivative is about 30% ∼ 40% of the cost for computing the scattering matrix itself.

Use Case: Optimization of Meta-atom Structure
Controlling phase and amplitude of monochromatic light.First, we optimize meta-atom structures to reach specific transmitted amplitudes and phases for a monochromatic light (at 1.55µm wavelength, x-polarized).The cross-sectional shape is shown in Fig. 5-a, determined by two parameters.The objective function for the inverse design is defined as where T LR is the transmission submatrix in the scattering matrix (recall (1)), m is the mode index for the incident and outgoing light in free space, thus T LR (m, m) denotes the m-th diagonal element of the matrix.Also, t m is a complex constant specifying the target amplitude and phase of the transmission.Here we consider the fundamental mode (the way to choose corresponding m is given in Supplemental 1), which describes the far-field light transmission along the z-direction.
To verify the robustness of our method and the enabled optimization, we evenly sample different targets t m on a circle on the complex plane (see Fig. 5-b).For each target, we find meta-atom's shape parameters by minimizing (12) through a gradient-descent algorithm [31], for which the gradients of ( 12) with respect to the design parameters are computed using our method.As shown in Fig. 5-b, we are able to automatically discover structures that reach these targets closely.
Controlling phases for both x− and y−polarized light.Next, we optimize meta-atom structures to obtain target responses for x-and y-polarized light, simultaneously.This type of meta-atoms has been used to construct metasurface holograms [10].In our example, the light wavelength is 1.3µm; the meta-atoms have a fixed height of 2.0µm and a period of 2.5µm along x-and y-direction.The cross-sectional shape of the meta-atoms are specified by two parameters shown in Fig. 7-a.We determine the parameters by minimizing the following objective function: where the subscript x (and y) indicates light polarization; t x (and t y ) are the target phase changes from x-polarized (and y-polarized) incident light to the outgoing light with the same polarization (i.e., t x = exp(iφ x ) for some φ x ).The first term in (13) measures, for the x-polarized light, the cosine difference (through dot product on complex plane) between the m-th mode's phase change and the target phase change, and similarly for the second term.The optimized structures for different x-and y-polarized phase targets are shown in Fig. 7.In all cases, the residual between the target and the resulting phase change is within 7% of one period (2π), and in most cases within 1%.
Controlling amplitudes for multiple wavelengths.We also demonstrate inverse design of meta-atoms for another type of optical response: obtain two target amplitude responses at two separate wavelengths, simultaneously.This type of responses have proven useful for making colored metasurface holograms [26,27].
Here we consider two archetypes used in [27], each described by two parameters (see Fig. 6-a).The two wavelengths under consideration are 1.2µm (labeled as blue) and 1.6µm (red), and the objective function is defined as Here the subscript "1" and "2" indicate the blue (1.2µm) and red (1.6µm) wavelength, respectively.The first term accounts for the blue wavelength: T LR,1 is the transmission submatrix of the scattering matrix and A 1 is the desired amplitude.Similar is the second term.More terms can be added in (14) to incorporate more than two wavelengths.
For each archetype, we find its parameter values via a gradient-descent algorithm that minimizes (14), and choose between the two archetypes one that produces a smaller objective value.The optimized structures and their performances are shown in Fig. 6.For almost all the experiments (each with a different amplitude target), the resulting amplitudes by the inversely designed meta-atoms match closely to their targets.
General cross-sectional shape design.Lastly, we introduce a new way to inverse design the meta-atom's cross-sectional shape under a general representation.We use the star-convex polygon [37] to represent the cross-sectional shape.Such a shape can be discretized by sampling N points on its boundary so that the polar angles of these points are evenly distributed over [0, 2π].In other words, the (k + 1)-th point has the coordinate p k [cos (2kπ/N ), − sin (2kπ/N )], where p k is a non-negative value (see Fig. 8-a), and the shape is specified by N parameters p 1 , . . ., p N .A large N offers many degrees of freedom to represent a complex shape, but meanwhile renders exhaustive search through the entire parameter space too expensive-one must rely on numerical optimization methods to determine the parameter values.
This shape representation is particularly suitable for RCWA-based analysis, as it allows for a closed-form 2D Fourier transform of the shape (and thus the permittivity distribution) [21].In RCWA framework, 2D Fourier transform of the cross-sectional permittivity distribution is needed for computing the matrices, P and Q, as well as their derivatives with respect to the p k parameters.Supplemental 1 provides the details of this process.
As examples, we optimize octagons (N = 8) to obtain desired optical responses in different scattering directions.First, we specify the target scattering directions.Notice that to predict optical behavior of a single meta-atom in simulation, periodic boundary condition is often used.Under this condition, the meta-atom is effectively a 2D grating structure, for which we can use diffraction orders to specify different scattering directions: the output light with diffraction order (p, q) is along the direction where L x and L y are periods along x-and y-axis, respectively (L x = L y = 1µm in our examples).We consider x-polarized light with the wavelength of 1.55µm.The goal here is to obtain specified far-field phases and amplitudes at two scattering directions-ones that correspond to the diffraction orders, (−1, 0) and (1, 0), as shown in Fig. 8-b.We further restrict p k to be in the range [0.15µm, 0.45µm], and determine p k values by minimizing  where n 1 and n 2 are mode indices for the diffraction orders (−1, 0) and (1, 0), respectively; and t 1 and t 2 specify the target phases and amplitudes (as complex values) in the two outgoing directions.We perform two experiments for two sets of t 1 and t 2 goals.The optimization convergence curves and resulting shapes are shown in Fig. 8.

Conclusion
We have presented an algorithm for computing the derivatives of the scattering matrices of a photonic structure with respect to its structural parameters.Our method is built on the framework of semi-analytical methods for analyzing photonic structures.A key step in semi-analytical methods for computing scattering matrices is the eigen-decomposition.However, to compute scattering matrix derivatives, directly differentiating the eigenvalue analysis poses significant difficulties.We show a new route to compute scattering matrix derivatives without the need of differentiating the eigen-decomposition process.
The scattering matrix derivatives present how a photonic structure's performance changes as its structural parameters vary.While we demonstrated their use in the optimization of meta-atom units, they are useful in many other applications.Therefore, our method may serve as a useful analysis tool in a wide range of photonic design tasks.
Differentiable Scattering Matrix for Optimization of Photonic Structures: supplemental document 2 An Example of Eigenvector Discontinuity Figure 2 in the main text provides an exemplar meta-atom structure in which repeated eigenvalues (and thus repeated propagation constants) exist.Here we use a simple mathematical example to show that when repeated eigenvalues emerge, the eigenvector derivatives become undefined.
Consider the following 2 × 2 matrix with a parameter p: When p = 0, its eigenvectors form a constant matrix.
In fact, when p = 0, this eigenvector matrix is unique up to a scale.When p becomes zero, repeated eigenvalues emerge (both are 1).However, while X is still a valid eigenvector matrix of A(0), it is not unique anymore.In fact, any 2 × 2 orthonormal matrix is a valid eigenvector matrix of A(0).For example, the standard eigen-decomposition solver (e.g., in Matlab) gives Therefore, in the presence of repeated eigenvalues, the non-uniqueness of eigenvectors causes the eigenvector derivatives undefined-in this case, an infinitesimal deviation from p = 0, can cause the eigenvectors to change from (17) to (16), discontinuously.
Remark.In this simple example, the discontinuity can be fixed by examining the limit of the X derivative as p approaches 0, because in this case the derivative of X at p = 0 is well-defined.However, the situations encountered in many photonic design tasks can be much more challenging.For example, in Fig. 2 of the main text, not only are the eigenvalues (i.e., effective indices) repeated, their derivatives with respect to the parameter are also repeated (e.g, see mode 0 and mode 1 in Fig. 2-c).In those cases, the derivative of X is not well-defined any more.If up to the n-th order derivatives of the eigenvalues are repeated, one has to rely on the n + 1-th order derivative of X to resolve the discontinuity.This is a rather expensive, if not impossible, computational process.

Figure 2 :
Figure 2: Repetition from symmetry.Consider a meta-atom structure (b)whose cross-sectional shape (a) is parameterized by α, which controls the size of the cross-shaped hollow region.The shape symmetry causes the structure's propagating modes to have repeated effective indices (c), as also indicated by the repeated eigenvalues when one solves (2).As α varies, mode 0 and mode 1 always have the same effective index, meaning that their first-order and higher-order derivatives of the corresponding eigenvalues with respect to α are always the same, and thus their eigenvector derivatives are not mathematically well-defined.The same issue occurs in other modes when α becomes small and more propagating modes appear (e.g., see mode 5 and mode 6 when α is within ∼ [0.3, 0.5]).

Figure 3 :
Figure 3: Accuracy comparison.(a)We use our method and FDTD (Lumerical[36]) to analyze a 3D meta-atom.The width of the pillar and its square hole are 0.6µm and 0.2µm, respectively, and it has a height of 1.4 µm.We use the periodic boundary condition, with the period of 0.66µm.(b) We scan the (x-polarized) wavelength and plot the effective indices of the fundamental mode evaluated by FDFD and our method.(c) For each wavelength (color mapped here), we compute the far-field amplitudes and phases changes, and compare our results to FDTD.

Figure 4 :
Figure4: Scattering matrix derivatives.We choose two matrix elements in the scattering matrix, and plot their FD derivatives estimated with different FD sizes (∆α in x-axis) in (a) and (b), respectively.In each plot, the red and blue solid curves correspond to the real and imaginary parts of the estimated derivative.Meanwhile, the derivatives computed by our method are indicated by the red (real) and blue (imaginary) horizontal dash lines.These plots show that FD estimation is highly sensitive to ∆α.Light green regions indicate valid ∆α ranges for both matrix elements.The valid ∆α varies element by element.Thus, in practice, it is hard to choose a proper ∆α for the entire scattering matrix, whereas our method is always robust.

Figure 5 :
Figure 5: Optimization of(12).(a) We optimize the cross-sectional shape specified by two design parameters α and β in order to reach a target transmission amplitude and phase.We examine different targets evenly sampled on a circle on the complex plane (indicated by the square dots in (b)).The optimized amplitudes and phases (indicated by triangular dots) reach closely to the targets.As a reference, we also show the amplitudes and phases (in circular dots) of the designs that globally minimize(12), that is, ones obtained through a slow, exhaustive search of all parameter combinations.While no gradient-based optimization algorithm can guarantee the global minimum of(12), our results approach the targets closely, comparable to what the global minimums can achieve.The resulting cross-sectional shape for each sampled target are shown in (c).

Figure 6 :
Figure 6: Optimization of (14).(a) We optimize meta-atom structures described by two archetypes, each with two parameters.The goal is to obtain desired amplitude responses at two separate wavelengths (i.e., 1.2µm (blue) and 1.6µm (red)), simultaneously.We sample five amplitudes from 0.2 to 1 for each wavelength, forming 25 different optimization targets.Each target leads to a different cross-sectional design shown in (b).For red light, the discrepancies between achieved amplitudes and the targets are shown in (c), and the same visualization for blue light is shown in (d).

Figure 7 :
Figure 7: Optimization of (13).(a) We optimize the meta-atom structure described by two parameters.The goal is to achieve certain phase changes for xand y-polarized light simultaneously.(b) We sample six target phase changes for x-and y-polarized light, respectively.Their combination forms 36 different optimization targets.For each target, our optimization produces a cross-sectional shape design.(c) For x-polarized incident light, we show the residual (in terms of phase angle difference) between each pair of inversely designed phase change and the target.And similar visualization for y-polarized light is shown in (d).

Figure 8 :
Figure 8: Inverse design of star-convex meta-atoms.(a) The star-convex polygon is used to represent the cross-section of a meta-atom, defined by many control variables (p 1 . . .p 8 in this case).As an example, we inverse design the shape for reaching target amplitudes and phases in two scattering directions (i.e., corresponding to diffraction orders (-1,0) and (1,0) in (b)) simultaneously.(c-d)We perform two experiments to reach two sets of (t 1 , t 2 ) goals shown in the plots.In each experiment, our optimization finds the design parameters within hundreds of iterations, resulting in nontrivial shapes that are hard to be manually designed.