Scattering concentration bounds: Brightness theorems for waves

The brightness theorem---brightness is nonincreasing in passive systems---is a foundational conservation law, with applications ranging from photovoltaics to displays, yet it is restricted to the field of ray optics. For general linear wave scattering, we show that power per scattering channel generalizes brightness, and we derive power-concentration bounds for system of arbitrary coherence. The bounds motivate a concept of"wave \'{e}tendue"as a measure of incoherence among the scattering-channel amplitudes, and which is given by the rank of an appropriate density matrix. The bounds apply to nonreciprocal systems that are of increasing interest, and we demonstrate their applicability to maximal control in nanophotonics, for metasurfaces and waveguide junctions. Through inverse design, we discover metasurface elements operating near the theoretical limits.

The "brightness theorem" states that optical radiance cannot increase in passive ray-optical systems [1]. It is a consequence of a phase-space conservation law for optical étendue, which is a measure of the spatial and angular spread of a bundle of rays and has had a wide-ranging impact: it dictates the upper bounds to solar-energy concentration [2,3] and fluorescent-photovoltaic efficiency [3], it is a critical design criterion for projectors and displays [4], and it undergirds the theory of nonimaging optics [5]. Yet a generalization to electromagnetic radiance is not possible, as coherent wave interference can yield dramatic radiance enhancements. A natural question is whether Maxwell's equations, and more general wave-scattering physics, exhibit related conservation laws.
In this paper, we develop analogous conservation laws for power flow through the scattering channels that comprise the bases of linear scattering matrices. By a density-matrix framework more familiar to quantum settings, we derive bounds on power concentration in scattering channels, determined by the coherence of the incident field. The ranks of the density matrices for the incoming and outgoing fields play the role of étendue, and maximal eigenvalues dictate maximum possible power concentration. For the specific case of a purely incoherent excitation of N incoming channels, power cannot be concentrated onto fewer than N outgoing channels, which in the ray-optical limit simplifies to the classical brightness theorem. In resonant systems described by temporal coupled-mode theory, the number of coupled resonant modes additionally restricts the flow of wave étendue through the system. The bounds require only passivity and apply to nonreciprocal systems. We discuss their ramifications in nanophotonics-for the design of metasurfaces, waveguide multiplexers, randommedia transmission, and more--while noting that the bounds apply more generally to scattering in acoustic, quantum, and other wave systems.
Background: Optical rays exist in a four-dimensional phase space determined by their position and momentum values in a plane transverse to their propagation direction. Optical étendue [5] denotes the phase-space volume occupied by a ray bundle. In ideal optical systems, phase-space evolution is governed by Liouville's theorem, and thus radiance and étendue are invariants of the propagation. A differential ray bundle propagating through area dA and solid angle dΩ, in a medium of refractive index n and tilted at an angle θ, has an étendue of n 2 cosθdAdΩ. Figure 1(a) depicts étendue conservation in ray-optical systems and the consequent trade-off between spatial (dA) and angular (dΩ) concentration. Electromagnetic radiance is intensity per unit area per unit solid angle, which in ray optics is proportional to the flux per unit étendue. By étendue invariance, in tandem with energy conservation, ray-optical brightness cannot increase. In nonideal systems, étendue can decrease when rays are reflected or absorbed, but any such reduction is accompanied by power loss, and the theorem still applies.
Extending radiometric concepts such as radiance into wave systems with coherence, beyond ray optics, has been the subject of considerable study [6][7][8][9][10][11][12][13][14][15]. Wigner functions can represent generalized phase-space distributions in such settings, and they are particularly useful for "first-order optics," i.e., paraxial approximations, spherical waves, etc. Yet the Wigner function and similar approaches cannot simultaneously satisfy all necessary properties of a generalized radiance [8,12,14]. This does not preclude the possibility for a Wigner-function-based brightness theorem-indeed, this represents an interesting open question [16]-but we circumvent the associated challenges by recognizing power transported on scattering channels as the "brightness" constraints in general wave-scattering systems.
Concentration bounds: Consider generic linear wave scattering in which some set of input waves ψ in are coupled to a set of output waves ψ out (in domains that may be overlapping or disjoint) through a scattering operator S as follows: ψ out Sψ in . We assume that the scattering process is not amplifying but does not have to be reciprocal or unitary. To describe the scattering process in a finite-dimensional basis, we adapt the formalism developed in Refs. [17][18][19][20][21]. As is well established in classical and quantum scattering theory [22][23][24][25], the operator S comprises two contributions: a "direct" (background) contribution from waves that travel from input to output without the scatterer present, which we denote with the operator D, and a "scattered-field" contribution from waves that are scattered from input to output only in the presence of the scatterer, which we denote with the operator T (as in "T matrix" approaches [26][27][28][29]). A key insight of Refs. [17][18][19][20][21] is that the T operator is compact (in fact, it is a Hilbert-Schmidt operator, by the integrability of the squared Frobenius norm of its kernel), which means that one can accurately represent it by a finite-dimensional singular-value decomposition where U and V define orthonormal bases under an appropriate inner product h,i, and the restriction to finite dimensions is possible by retaining only those singular vectors corresponding to nonzero singular values, i.e., "well-coupled channels" [21]. The direct-process operator D is not necessarily compact-for example, D for scattering within a spherical domain is the identity operator [23,24,26]-and thus does not have the same natural decomposition. Nevertheless, we can still project the input and output states onto V and U, respectively. Such a representation will necessarily miss an infinite number of input states with a nontrivial direct-process contribution, but by definition those states will have no interaction with the scatterer, and thus they have no consequence on power-concentration bounds or on the definition of a wave étendue. We include the direct process at all in order to naturally incorporate interference effects between the direct and scattering processes. Thus, for any scattering problem, the columns of V and U define our scattering channels, within which our input and output waves can be decomposed, as follows: where c in and c out are the vector coefficients of the excitations on these channels as shown in Fig. 1(b). The scattering matrix connects c in to c out and can be found by starting with the definition of the S operator, ψ out Uc out Sψ in SVc in , and then taking the inner product with U to find We take our inner product to be a power normalization, such that c † in c in and c † out c out represent the total incoming and outgoing powers, respectively.
Perfectly coherent excitations allow for arbitrarily large modal concentration (e.g., through phase-conjugate optics [30,31]), but the introduction of incoherence incurs restrictions. To describe the coherence of incoming waves, we use a density matrix ρ in [32] that is the ensemble average (hereafter denoted by h·i, over the source of incoherence) of the outer product of the incoming wave amplitudes, written as The incoherence of the outgoing channels is represented in the corresponding outgoing-wave density matrix The matrices ρ in and ρ out represent density operators projected onto the U and V bases. Both matrices are Hermitian and positive semidefinite. For inputs defined by some ρ in , how much power can flow into a single output channel, or more generally into a linear combination given by a unit vectorû? If we denoteû † c out as c out,û , then the power throughû is hjc out,û j 2 i û † ρ outû û † Sρ in S †û . The quantity hjc out,û j 2 i is a quadratic form in ρ in , such that its maximum value is dictated by its largest eigenvalue [33], λ max , leading to the inequality To bound the term in parentheses,û † SS †û , we consider coherent scattering for a new input: c in S †û . For this input field, the incoming power isû † SS †û , while the outgoing power in the unit vectorû is jû † c out j 2 û † SS †û 2 . Enforcing the inequality that the outgoing power inû must be no larger than the (coherent) total incoming power, we immediately have the identityû † SS †û ≤ 1. (We provide an alternative proof in Supplement 1.) Inserting into Eq. (6), we arrive at the bound  Equation (7) is a key theoretical result of this paper. It states that for a system whose incoming power flow and coherence are described by a density matrix ρ in , the maximum concentration of power is the largest eigenvalue of that density matrix. For a coherent input (akin to quantum-mechanical "pure states" [34]), there is a single nonzero eigenvalue, equal to 1, such that all of the power can be concentrated into a single channel. For equal incoherent excitation of N independent incoming states, the density matrix is diagonal with all nonzero eigenvalues equal to 1∕N , in which case hjc out,û j 2 i ≤ 1 N : Equation (8) is less general than Eq. (7) but provides intuition and is a closer generalization of the ray-optical brightness theorem. Since the average output power per independent state must be less than or equal to 1∕N , at least N independent outgoing states must be excited, or a commensurate amount of power must be lost to dissipation. In reciprocal systems, this bound follows from reversibility. In Supplement 1, we prove that Eq. (8) simplifies to the ray-optical brightness theorem for continuous planewave channels in homogeneous media. Just like the ray-optical brightness theorem [1,35], our scattering-channel bounds can alternatively be understood as a consequence of the second law of thermodynamics. If it were possible to concentrate incoherent excitations of multiple channels, then one could filter out all other channels and create a scenario with a cold body on net sending energy to a warm body [20]. The partially coherent case is not as physically intuitive, but the application of such thermodynamic reasoning could be applied to the modes that diagonalize ρ in , and then a basis transformation would yield Eq. (7).
Wave étendue: Equations (7) and (8) imply that the incoherent excitation of N inputs cannot be fully concentrated to fewer than N outputs. This motivates the identification of "wave étendue" as the number of incoherent excitations on any subset of channels (incoming, outgoing, etc.). For a density matrix ρ, one can count independence by the matrix rank and define étendue = rankρ.
To understand the evolution of wave étendue through the scattering process, we reconsider the singular-value decomposition (SVD) of Eq. (1). The matrix Σ is a square matrix with dimensions N × N , where N is the number of well-coupled pairs of input and output scattering channels. Since the singular values are nonzero, we know that Σ is full rank. The scattering matrix S is the sum of Σ and the direct-process matrix, and its rank will be N minus the number of coherent perfect absorber (CPA) states, N − N CPA , where the CPA states arise if the direct process exactly cancels a scattered wave, yielding perfect absorption [36,37]. (Technically, these may be partial-CPA states, exhibiting perfect cancellation of the direct fields only on the range of the T operator.) The density matrix ρ in is a representation of the incoming excitations on the basis V of Eq. (1), and thus it cannot have rank greater than N itself. By the relation ρ out Sρ in S † and the matrix-product inequality rankAB ≤ minrankA, rankB (Ref. [33]), the rank of ρ out must lie within bounds given by the rank of ρ in minus the number of CPA states and the rank of ρ in itself as follows: Equation (9) defines the maximum diversity possible in the evolution of wave étendue in linear scattering systems. For lossless systems-or more generally any system without CPA states-we must have N CPA 0, in which case Eq. (9) is a conservation law stating that the density-matrix rank is always conserved. (In Supplement 1 we show that this simplifies to the classical wave-étendue conservation law in the ray-optics limit.) Figure 1(b) depicts this rankdefined (channel-counting) definition of wave étendue. In wavescattering systems, phase space is defined by distinct scattering channels, without recourse to the position and momentum unique to free-space states.
Metasurface design: To probe the channel-concentration bounds, we consider control of diffraction orders through complex metasurfaces for potential applications such as augmentedreality optics [38,39] and photovoltaic concentrators [40][41][42]. Figure 2(a) depicts a designable gradient refractive-index profile with a period of 2λ and a thickness of 0.5λ. (Such an element could be one unit cell within a larger, non-periodic Transmission Reflection metasurface [43][44][45][46].) For incoherent excitation of N diffraction orders, Eq. (8) dictates that the maximum average efficiency of concentrating light into a single output order (1) cannot be greater than 1∕N [dashed lines in Fig. 2(c)]. For s-polarized light incoherently incident from orders 0 (red); −1, 0 (green); −1, 0, 1 (blue); and −2, −1, 0, 1 (purple) (20 deg angle of incidence for the zeroth order), we use adjoint-based "inverse design" [47][48][49][50][51][52][53][54] (Supplement 1) to discover optimal refractive-index profiles of the four metasurfaces shown in Fig. 2(b). (Broader angular control and binary refractive-index profiles could be generated through standard optimization augmentations [50,51], but here we emphasize the brightness-theorem consequences.) The transmission spectrum was computed by the Fourier modal method [55] with a freely available software package [56]. In Fig. 2(b), as the number of incoherent channels excited increases from 1 to 4, the average efficiency of the optimal structures decreases from 95.5% to 24.9%. (In Supplement 1 we show that optimal blazed gratings fall far short of the bounds.) We also probe the effects of partial coherence by varying the coherence between two input orders, per the density matrix in Fig. 2(a). By Eq. (7), maximum concentration is determined by the largest eigenvalue of ρ in , which is 1 − c∕2, where c is the coherence parameter. Figure 2(c) shows inverse-designed structures for c 0.2, 0.4, 0.6, 0.8, 1, with unique structures optimizing the response depending on the coherence of the excitation. All of the structures maximize efficiency in the incoherent c 0 case because the eigenvalues of the density matrix are degenerate, and thus transmission of any state is optimal. Étendue transmission: An important related scenario to consider is one in which the direct (background) process is ignored, with the focus solely on interactions with scatterers. Instead of the input and output fields considered above, the relevant decomposition is instead into incident and scattered fields. Using the same terminology as in the input-output scattering operator, we can connect the incident and scattered fields by a T operator [26][27][28][29] ψ scat T ψ inc . Again, as shown in Refs. [17][18][19][20][21], the T operator is compact and defines incoming-and scatteredwave bases by its SVD, Eq. (1). Furthermore, to align with various applications described below, we will specify a set of N trans ≤ rank U desirable "transmission" channels that are a subset of the scattered-field channels defined by U. To understand transmission flow into these channels, we will define our finite-dimensional T matrix as the restriction of T onto this subset of scattered-field channels. The matrix T connects the incomingfield channels to the transmission channels. The "transmission" terminology, partially meant to avoid further overload of the word "scattering," is intended simply to represent the flow of energy through a system, enabled by interactions with a scatterer. For a planar or periodic scatterer, both reflected and transmitted waves would be part of this generalized "transmission" process, as long as they differ from the direct free-space process.
We define "étendue transmission" as the number of incoherent excitations that can successfully be transmitted through scatterer interactions onto the transmission channels. Equation (8) dictates that at least N output channels are excited for N orthogonal inputs, and indeed this result is proven in the incoherent case in Ref. [20] through an SVD of the T matrix (denoted therein by "S"). If the number of transmission channels, N trans , is less than rankρ inc , where ρ inc is the incident-wave density matrix, then the incoherent excitations cannot all be concentrated onto the transmission channels, and some power must necessarily be scattered into other scattering channels.
Resonance-assisted transmission, in which resonances couple the incident and transmission channels, introduces an additional constraint: the number of resonant modes (resonances) M coupled to the relevant channels. Resonant modes are not scattering channels; instead, they are the quasi-normal modes (QNMs) of the scatterer, subject to outgoing boundary conditions. (Quasi-normal modes have been extensively studied and applied to various scattering systems for the last decade [57][58][59], and in the limit of closed systems and self-adjoint Maxwell operators they reduce to conventional guided and standing-wave modes [60].) We consider systems where the interaction with resonant modes can be described by temporal coupled mode theory (TCMT) [61][62][63], wherein the scattering process is encoded in an M × M matrix Ω, comprising the real and imaginary parts of the resonant-mode resonant frequencies, and a matrix K, denoting channel-mode coupling. In TCMT, the T matrix for the resonance-assisted transmission component is (Supplement 1) where K trans and K inc are the N trans × M and N inc × M submatrices of K denoting modal couplings to the transmission and incident channels, respectively.
The maximum (average) power flow into a single transmission output channel is subject to the bounds of Eqs. (7) and (8), now in terms of the density matrix ρ inc . The matrix ρ trans equals Tρ inc T † . By recursive application of the matrix-rank inequality used above, we can see that The number of orthogonal outputs is less than or equal to the minimum of the numbers of incident inputs, resonant modes, and transmission channels. As depicted in Fig. 3, transmission channels and resonant modes act like apertures [35] in restricting the flow of étendue through a system. We may also consider total transmission onto all N trans transmission channels, i.e., P i hjc trans,i j 2 i. Since the transmission onto a single output is bounded above by λ max ρ inc , the total power is bounded above by the sum of the first rankρ trans eigenvalues (Supplement 1) as follows: where the eigenvalues are indexed in descending order. For incoherent excitation of the N inc channels, λ i ρ inc 1∕N inc for all i, and the term on the right of Eq. (11) simplifies to minN inc , M , N trans ∕N inc . In resonance-assisted transmission Research Article scenarios, Eq. (11) represents an additional constraint on power flow: in addition to the number of output channels, total power flow is further constrained by the number of distinct resonant modes that interact with them. Étendue restrictions anywhere in the transmission process necessarily generate scattering to channels other than the desired transmission channels. We apply the resonance-assisted-transmission bounds to TCMT models of waveguide multiplexers as depicted in Fig. 4. There has been significant interest [52,[64][65][66][67][68] in the design of compact junctions for routing light. In Figs. 4(a)-4(c), we consider "input" waveguides and "output" waveguides coupled by a resonant scattering system. For two input waveguides, we consider three scenarios: (a) one output and two resonances, (b) one resonance and two outputs, and (c) two resonances and two outputs. In each case, a highly controlled coherent excitation can, through appropriate design of the resonator, yield perfect transmission on resonance at the output port. But a robust design, impervious to noise or other incoherence, may be required, and such noise would introduce incoherence that is subject to the bound of Eq. (11). In each case, we optimize TCMT model parameters (cf. Supplement 1) to maximize transmission for all phase differences between two inputs. Device (c) maintains perfect transmission, whereas devices (a) and (b) are highly sensitive to noise as predicted by the restrictions to étendue flow.
The channel-concentration bounds and wave-étendue concept generalize classical ray-optical ideas to general wave scattering. In addition to the nanophotonic design problems considered here, there are numerous potential applications. First, they resolve how to incorporate polarization into ray-optical étendue, showing unequivocally that polarizing unpolarized light requires doubling classical étendue, an uncertain conjecture in display design [69,70]. Moreover, the natural incorporation of nonreciprocity into the bounds is of particular relevance given the emerging interest in nonreciprocal photonics [71][72][73] and acoustics [74], and it places constraints on many of these systems (extensions to time-modulated TCMT systems should be possible). Another additional application space is in random-scattering theory [25,[75][76][77]. For opaque optical media comprising random scatterers, there is significant interest both in controlling the scattering channels (e.g., wavefront shaping) as well as studying the effects of partial coherence on scattering and absorption [78].
Our work shows that fully coherent excitations are optimal for maximal concentration on the fewest possible states. A related question that remains open is whether partial coherence might be optimal for total transmission. The framework developed herein may lead to fundamental limits to control in such systems.

DERIVING THE CLASSICAL BRIGHTNESS THEOREM FROM ITS WAVE-SCATTERING GENERALIZATION
Here we show that the classical ray-optical brightness theorem follows from our wave-scattering generalization. Consider a ray-optical system with an entrance plane and an exit plane. Let us consider the power flow from within a differential area ∆A 1 on the entrance plane through a differential area ∆A 2 on the exit plane. In a wave-scattering framework, what are the power-normalized channels? And how many channels are there in a differential area ∆A?
In ray optics, the wavelength is taken to zero, such that even an infinitesimal area is arbitrarily large relative to the wavelength. Thus we can consider the differential area as the "box" (actually square) into which the states must fit with periodic boundary condition. If we take ∆A = ∆x∆y (with z as the propagation direction), the states that satisfy periodic bound-ary conditions on ∆A are plane waves with k x and k y taking integral multiples of 2π/∆x and 2π/∆y, and k z fixed by the frequency and specific values of k x and k y . Thus we can write non-normalized electric-field states as where k i is the corresponding wavevector for state i. In our manuscript, we choose for simplicity a channel definition such that the total power flowing through a given channel is 1. Since the intensity of a plane wave of amplitude E 0 is |E 0 | 2 /2Z 0 , where Z 0 is the impedance of the medium, we can choose our properly normalized channel states as Taking H i to be the magnetic field of channel i, one can then verify that the real part of (1/2) ∆A E i × H j is indeed δ ij , and we have a basis of power-orthonormal channels. Now we can answer the question about the number of channels (states) within ∆A in the range from k ⊥ to k ⊥ + ∆k ⊥ and k z to k z + ∆k z (which determine the propagating direction), so that k 2 ⊥ + k 2 z = k 2 . A state occupies a region (2π/∆x)(2π/∆y) = 4π 2 /∆A in k x k y -space, then the total number of states is (including an extra factor of two for polarization) Since k 2 ⊥ = k 2 sin θ 2 , where θ is the angle between k and z axis, we have k ⊥ ∆k ⊥ = k 2 cos θ sin θ∆θ . Substitute this relation into (S4) Since rank (ρ out ) = rank (ρ in ) ,the number of incoherent excitations will remain unchanged for an ideal system, we thus conclude that N is an invariant quantity in the course of wave propagation. Hence, we recover the law of étendue conservation. Now reconsider power flowing from area dA 1 and solid angle dΩ 1 at the entrance plane. The number of channels that are (incoherently) excited is given by How much power can flow into area ∆A 2 and solid angle ∆Ω 2 at the exit plane? From our modal-concentration inequality, we know that the power out on any given channel i on the exit plane is less than or equal to 1 divided by the number of excited incoming channels which is a ramification of the ray optics brightness theorem that it is impossible to concentrate rays to increase the brightness. We could have equivalently derived the usual concentrationratio bounds on optical concentrators, and by standard rayoptical arguments one can see that the étendue derived in Eq. (S4) generalizes to non-ideal ray-optical systems as usual.

Justification of the choice of periodic boundary condition
Here we provide a mathematical proof of the fact that our periodic basis set can represent any function, in the appropriate normed space. Before going into the mathematical details, we can provide the intuition: on any finite domain of size L, a periodic-boundary Fourier-series expansion can accurately represent a function everywhere except at its boundary. In the limit that L → ∞ (or equivalently λ → 0), the function values exactly at the boundary no longer matter, and the periodic-boundary representation approximates any function arbitrarily well. This idea is well-understood in Fourier analysis; now we apply it to the physical optics problem at hand.
Suppose we start with an arbitrary wave field f (x) that varies along a one-dimensional plane along dimension x. (The 2Dplane generalization is trivial, simply multiplying the 1D results.) This function can be represented by its Fourier transform as In our ray-optics derivation, we use the approximate representation where Our claim is that the error between f (x) and f N (x), measured in the L 2 norm, goes to zero in the ray optics limit (L/λ → ∞, or N → ∞), i.e.
Proof: Given Eq. (S7), it is straightforward to find the coefficients c n that are the optimal Fourier-series coefficients of f N (x) that most closely approximate f (x): where k n = 2π L n. Then we can insert the Fourier-transform definition of f (x) into Eq. (S11) and integrate to find that the c n are given by where sinc(x) = sin(x) x . Then we substitute c n into Eq. (S8) to obtain f N (x) as We can directly insert this expression for f N (x) into the error expression, Eq. (S10), and carry out the integration over x to obtain with the help of the orthogonality of Fourier series, What remains to be shown is that the whole expression in the curly bracket goes to zero, as N goes to ∞. This can be proven by invoking the Mittag-Leffler expansion of cot(z), , for all complex number z, (S16) which can be found in complex-analysis textbooks [6].
In order to use the expansion, we rewrite the double sinc sum by partial fraction. Hence, as N goes to ∞, which is the ray optics limit that L λ goes to ∞. It is worth mentioning that the convergence is uniform, as convergence of Eq. (S16) is uniformly for all complex z. Consequently, for any bounded f (k), we can bring the N limit into the k x and k x integral in Eq. (S14), proving that lim N→∞ e N = 0. (S19) To conclude, we have proven that the discrete plane wave basis in the ray-optics limit is capable of representing arbitrary wave forms. More abstractly, what we have proven is a wellknown fact -that a Fourier series is able to represent any integrable function on a closed interval in the L 2 norm sense [5]. The L 2 norm is the physically meaningful one here, since observable power/momentum flows are defined by inner products of the appropriate wave/field functions.

ALTERNATIVE PROOF FOR THE BRIGHTNESS-CONCENTRATION INEQUALITY
Here we provide a more compact-but with less physical intuition and non-constructive-proof of the fact that u † i SS † u i ≤ 1. For any matrix A , the matrices A † A and AA † have the same eigenvalues, as can be proven by inserting a singular valued decomposition of A into the expressions. By energy conservation, S is subunitary, and the eigenvalues of S † S, and therefore of SS † , must be smaller than 1. By the variational principle (i.e. Rayleigh quotient [1]), we therefore must have u † i SS † u i ≤ 1 for any unit vector u i .

COUPLED-MODE THEORY FOR ÉTENDUE TRANS-MISSION
In this section we specify the step-by-step procedure to identify the rank of the transmission rank in coupled-mode theory, which plays the critical role in restricting étendue transmission through the system. We start with standard coupled-mode theory: in a scattering system with N channels, we have N × 1 incoming and outgoing coefficients c in and c out , respectively, and an M × 1 vector (for M resonances) of resonant-mode amplitudes a. The three vectors are related by the coupled-mode equations: where is a matrix whose Hermitian part is diagonal, comprising the real parts of the resonant frequencies, and whose anti-Hermitian part encodes dissipation via external coupling. The matrix K denotes coupling between the modes and the incoming/outgoing channels, while C denotes direct-scattering processes, independent of the resonances. Typically one can take C = −I (as the waveguide combiner cases in the main text), where I is the N × N identify matrix, essentially as a normalization stating that in the absence of the scatterer, all energy flows back into the channel it came in on, with a negative amplitude. Next one can solve for a to get c out in terms of c in : where the term in square brackets is the scattering matrix. We are interested in the resonance assisted transmission from N inc to an orthogonal subset of N trans scattering channels. So we are invited to write Eq. (S22) as Because there are only N inc non-zero excitations, it is convenient to work with a N inc × 1 vector c inc , a subset of c in . Similarly, as only transmissions from N trans channels are collected, we can now work with a N trans × 1 vector c trans , a subset of c reson out . Accordingly, only a submatrix of K of size N inc × M, denoted by K inc , contains the coupling information between c inc to resonance modes; another submatrix of size N trans × M, denoted by K trans , connects the resonance modes with c trans . Following the above decompositions, for the T-matrix (or transmission matrix) from c inc to c trans , namely can be written as With the T-matrix definition of Eq. (S24), we can analyze resonance assisted power flow into the transmission channels due to incoherent excitations of the incident channels with the same matrix-trace approach used for c inc , c out , and the scattering matrix in the main text. As a result, the average power in transmission channel i is given by Let us consider the total average power (equal to the average total power) transmitted onto all transmission channels, which is simply the sum of Eq. (S26) over i: where v i is the eigenvector of ρ inc with eigenvalue λ i (we label those eigenvalues in decreasing order, i.e. λ 1 ≥ λ 2 ≥ λ 3 · · · ) . The summation over i includes all necessary channels so that However, the rank of T restricts the meaningful inclusion of v i -the number of v i such that Tv i = 0 is at most rank (T) as a consequence of the ranknullity theorem [2]. Besides, since there are at most rank(ρ inc ) number of v i s are relevant, we can thus bound Eq. (S31) by where U = min rank(ρ inc ), rank (T) .
In the main text, we used a coherent-scattering example to show that for each individual i, u † i SS † u i ≤ 1. Exactly the same proof, but with the inc/trans channels replacing the in/out channels, leads to the same inequality for T, i.e. u † i TT † u i ≤ 1 and v † i T † Tv i ≤ 1. Thus all of the eigenvalues of TT † and T † T thus both less than or equal to one (and they are greater than or equal to zero because TT † and T † T are positive semidefinite). This reduce Eq. (S31) into We can still do better by bounding the rank of T using the fact that rank(AB) ≤ min rank(A), rank(B) . We write again the coupled-mode expression for T, Eq. (S25), now enumerating the number of rows and columns of each matrix: By recursive application of the rank inequality for matrix products, we have Finally, combining Eq. (S33), Eq. (S35) and the fact rank(ρ in ) ≤ N inc gives where eigenvalues are placed in decreasing order with λ 1 ≥ λ 2 ≥ λ 3 · · · . In the case of equal, incoherent excitations (for which all of the eigenvalues of ρ inc are 1/N inc ):

COUPLING CONSTANTS OF COUPLE-MODE MOD-ELS OF WAVEGUIDE COMBINERS
In this section, we list the coupling constants engineered to maximize transmission of waveguide combiners in Fig. 4. A random sweep on these parameters are done and those achieving a full transmission are selected. As aforementioned, C is chosen to be −I in Eq. (S21). The Ω in Eq. (S20) can be further split into a diagonal Hermitian part Ω H and an anti-Hermitian part −iΓ, i.e.
Due to the constraint, it is sufficient to specify only Ω H and K to fully determine the coupled mode equations in Eqs. (S20,S21). For simplicity, in all three cases, K is chosen so that Γ is diagonal, meaning the resonance modes are orthogonal. The numerical value of K and Ω H are as follows: Waveguide combiner (a): two channels to one channel with two resonances:

PERMITTIVITY DATA OF OPTIMIZED METASURFACE UNIT CELLS AND INVERSE DESIGN METHODOL-OGY
In this section we provide the permittivity data of the unit cells of metasurfaces in Fig. 2 of the main text. All structures have unit cell of size 2λ and thickness 0.5λ , where λ is the wavelength of incident excitations. The permittivity is constant in the vertical dimension and are discretized into 100 equal-sized grids in the horizontal dimension. The leftmost grid is labelled as 1 and the rightmost one is labelled as 100. In the following table, we tabulate the numerical value of from grid 1 to grad 100 for all nine structures, represented by the color they appear in Fig. 2(b) and (c). A blazed grating of the same unit cell dimension, shown in Fig. ??  2(c) in the main text. The performance of the blazed grating falls far short of the theoretical limit.
In order to design such metasurfaces with 100 design parameters (permitivity in each grid), the adjoint method is adopted for computing the gradient of an objective function with respect to each design parameter efficiently. The objective function in our case is the sum of transmission power of various orders. Once the gradient is known, standard optimization technique, such as steepest descent, can be used to optimize the objective function to a desirable value. The adjoint method allows the gradient computation to be done in two simulations for each optimization iteration -a direct simulation to obtain the electric field E dir everywhere in the designable region and a adjoint to obtain the E adj everywhere. Then the gradient at each position is proportional to the imaginary part of E dir · E adj . The source of the adjoint simulation is often the time-reversed field to be optimized, which is obtained in the direct simulation.
With all that said, in the following section we provide a detailed derivation of the above procedures described. Suppose we have a generic figure of merit f (E, H) and given some initial design with material distribution (r) and µ(r) to be optimized. We need to first solve the "direct" problem and obtain the current value of figure of merit f , E dir and H dir everywhere . To see what is the next step to go, we perturb the design by δ (r) and δµ(r) in the designable region, then the response of f due to this material variation is given by [3] where the integral is over the designable region. The key is to calculate E adj and H adj generated by the adjoint sources P adj and M adj respectively. This can be done by solving the "adjoint" problem ( to the "direct" one) described in the following. By the reciprocity of Green's functions, the adjoint sources are of the form [3] hence, where J E adj and J M adj are the adjoint electric current and magnetic current respectively. Next, we expand any field by powerorthogonal modes. This form of expansion is particularly conve-nient for the case of metasurfaces. We define so that the modal orthogonality relation can be written as where P i carries the meaning of time averaged power of the ith mode. The integral is over the surface through which the energy flow and n is the unit normal vector to that surface. In this basis, with If the geometry in the region where our figure of merit is evaluated is not altered, the modes in the expansion Eq. (S48) remains the same. The only thing changes is the expansion amplitude a k . Hence, it is more convenient to write sources P and M as By the equivalence principle [4], we conclude that the incident fields of the "adjoint" problem is Recover its correct dimensions, we obtain the final expression: δ f = ω 2 Im dVδ (r)E dir · E adj + δµ(r)H dir · H adj , (S54) with incident fields for the "adjoint" problem as When the structure is discretized into small grids, where the ith grid has volume V i and material constant denoted by i and µ i . If the variation of δ and δµ is constant for each grid, Eq. (S54) reduces to a grid-wise expression These two quantities serve as the gradient of the figure of merit f with respect to material constants, which can be used in gradientbased optimization algorithms.