K-series approximation of vectorial optical fields for designing diffractive optical elements with subwavelength feature sizes

Diffractive optical elements (DOEs) are widely applied as compact solutions for desired light manipulations via wavefront shaping. Recent advanced chip applications further require their feature sizes to move down to the subwavelength, which inevitably brings forth vectorial effects of optical fields and makes the typical scalar-based theory invalid. However, simulating and optimizing their vectorial fields, which are associated with billions of adjustable parameters in the optical element, are difficult to do, because of the issues of numerical stability and the highly-demanding computational cost. To address this problem, this research proposes an applicable algorithm by means of a wave-vector (k) series approximation of vectorial optical fields. On the basis of the semi-analytical rigorous coupled wave analysis (RCWA), an adequate selection scheme on k-series enables computationally efficient yet still predictive calculations for DOEs. The performance estimations for exemplary designs by the finite difference time domain (FDTD) method show that the predicted intensity profiles by the proposed algorithm agree with the target by just a fractional error. Together with optimizing the geometrical degrees of freedom (e.g., DOE depth h) as compensation for errors from the truncation of k-series, the algorithm demonstrates its outperformance by one or two orders of magnitude in accuracy versus the scalar-based model, and demands only a reasonable computational resource.


Introduction
Diffractive optical elements (DOEs) are devices composed of transparent materials with complex topography to realize specified functions (e.g. optical computing and cell trapping [1,2]) that are not feasible with standard refractive optics. With the rapid development of additive manufacturing techniques [3][4][5] in recent years, especially two-photon polymerization lithography (TPL) [6][7][8], it has become more feasible to fabricate DOEs with feature sizes down to ten nanometers. So far this technology is widely used for optical applications in the visible band, such as holograms for virtual reality [9], colorful 3D prints [10], and optical anticounterfeiting devices [11]. Moreover, TPL has been proven to bear potential at producing DOEs for advanced chip-scale applications like all-optical diffractive deep neural networks [12][13][14]. However, for the mentioned feature size, the assumptions of typical scalar-based design methods are violated [15,16] and the vectorial effects of optical fields become more pronounced [17]. A general, fully rigorous design tool is therefore needed to enable an accurate synthesis of such elements. So far this is quite a challenging work for researchers.
A small strand of the literature has reported on designing DOEs with subwavelength feature sizes. For DOEs in the 1D domain, D.W. Prather et al. proposed an optimization-based algorithm using the boundary element method (BEM) [18]. Its computational costs, however, prevent the design of realistic DOEs in reasonable time frames. J. Jiang et al. developed the micro-genetic algorithm FDTD method [19], but it remains a challenging design tool due to its time-consuming searching routines. M.E. Testorf et al. demonstrated the Gerchberg-square pixels with feature size Λ ph and depth h, as indicated in Fig. 1(c). Here two kinds of media, the substrate medium with refractive index 2 = 1.5 and air with 1 = 1.0, occupy the pink and transparent pixels, respectively. Figure 1(a) illustrates the transmission fields on the exiting surface of DOE (at z=h+ h = + h ), providing a plane-wave incidence with wavelength at z = 0 and structure parameter Λ ph / = 6. An appropriate separation h from the surface z = h is adopted to diminish the influence of evanescent waves. In this classical regime of refractive optics-i.e., Λ ph / ≫ 1 -numerical results show that the intensity profile of optical fields is only dependent on the morphology (the pink or transparent pixel) of an individual pixel and has almost constant amplitude value |E| = 1. In Fig. 1(a') the corresponding phase profile arg(E) of the fields also shows significant dependence on the local pixel morphology, where the value of the phase difference between two kinds of pixels is , confirming the prediction by the scalarbased theory (see the texts below) [29]. Compared to Figs. 1(a) and 1(a'), Figs. 1(b) and 1(b') clearly show that both the intensity and the phase of the transmission fields exhibit across-pixel continuity and display the sub-pixel profile for the DOE with Λ ph / = 0.6. The assumption of uniformity and local-pixel-dependence of optical fields made by the scalar-based theory, i.e., |E| = 1 and arg(E) = 2 n 1 h/ for the transparent pixel, and |E| = 1 and arg(E) = 2 n 2 h/ for the pink pixel, is hence considerably invalid. Such results may also imply that these single-value representation of optical fields per pixel by the scalar-based theory can be problematic in the regime Λ ph / ≤ 1. To explore the generality of vectorial effects for systems studied, we statistically analyze fields of a specified (central) pixel inside DOE as a function of pixel size, by means of an ensemble of 20000 specimens per Λ ph value. Similar to the schematic DOE in Fig. 1(c), each specimen has its plane dimension 3.5 × 3.5 and is assigned with stochastic pixel morphology except for the specified (central) one. Here, the FDTD method is adopted for giving convincing conclusions. Numerical results for amplitudes of different field components on the exiting surface (at z= + h ) are plotted in Fig. 2(a), provided that the incident field has x polarization. The curve of phase of the field's x component is shown in Fig. 2(b), where the curves for the other two components are not displayed, owing to their simple uniformly random distribution over the 0-2 range. In principle, Fig. 2 portrays the properties of optical fields from the regime Λ ph / ≫ 1 governed by the scalar-based theory (ST), across the regime 0 ≪ Λ ph / ≤ 1 belonging to vectorial optics, and to the regime Λ ph / ≪ 1 conforming to the effective medium theory (EMT) [16]. As expected, numerical results show that the amplitude and phase of fields exhibit asymptotic convergence toward constants E x,EMT and E x,EMT by EMT at Λ ph / ≪ 1, as well as toward constants E x,ST and E x,ST by ST at Λ ph / ≫ 1, respectively. Considerable depolarization and dephasing of fields, however, occur in the in-between regime and hence demand vector-based theories for different requirements. The FDTD method is adopted here for valid conclusions.

K-series approximation method to compute pixel fields P(x, y) by the vectorial model
To include vectorial effects into designs of finite-size (mm-to cm-scale) devices (with structure parameter Λ ph / ≤ 1) for practical applications, the two most popular rigorous electromagnetic theories, FDTD and RCWA, have been adopted for various attempts as discussed in Sec. 1. Unfortunately, the FDTD method signifies highly-challenging issues, since its computation time scales are the fourth order of the simulation (spatial) domain size N (e.g., (10N) 4 ≈ 10 (1+5)×4 at cm-scale), and the memory requirements are the third order [30]. The RCWA method has the same issues due to its computation time, as well as memory requirements, being fourth order of the simulation (momentum space) domain size K [31] (e.g., K 4 ≈ 10 5×4 at cm-scale). To efficiently design optics of DOEs, on the basis of the semi-analytical RCWA framework, this subsection presents a k-series approximation method associated with optional parameters, the rank of neighboring interferences r, and the sampling rate s, for selecting desired degrees of accuracy at the cost of computational resources. In principle, the method scales the computation time and memory requirement down to the level of N 2 (2r + 1) 2 s 4 ; e.g., about 10 11 at condition r=s=1 at cm-scale, in comparison with the level of N 2 ≃ 10 10 by the scalar-based method [29].
Here, the translation symmetry has been considered to estimate the computational cost for KSAM. We note that, implemented with lookup table techniques, the proposed method can further scale the computation time down to the level of N 2 ; whereas too large values of r and s can burden this method with high computational costs beyond FDTD and RCWA. The underlying criteria are described in detail in the following paragraphs.
For a binary DOE (one stacking layer), its permittivity and the incident field can be represented in a Fourier series of spatial harmonics [23] according to in which the wave vector components k xu and k yv arise from phase matching and the Floquet conditions and are given by Here, Λ and Λ are the periodicity along the x and y directions, respectively; is the vacuum wavelength of the optical wave; k 0 = 2 / is the free space wave number; n is the refractive index of the incident/outgoing medium (n = 1 for the air in this work); is the polar angle; and is the azimuth angle of the incident k-vector. We notice that the vector variables are in bold for convenience. On the basis of the spatial harmonics, Maxwell's equations can be expressed as an infinite set of first-order differential equations: Together with specific incidence conditions, Eqs. (8)- (11) can directly solve the vectorial fields e and h by means of eigenproblem algebra and can be easily extended to ℓ-level (ℓ = 2 for the binary one) DOEs [23,29]. For computationally efficient designs of DOEs, this work computes vectorial fields of an individual pixel under the pre-condition that the (singlepixel) field includes accurate characteristics of across-pixel interferences and sub-pixel fringes. In this way, KSAM can appeal to very finite k-series for still predictive computations, since the interference fields occur only among finite nearby pixels and the sub-pixel fringes are moderate under the condition 0 ≪ Λ ph / ≤ 1. By the reciprocity relations in Fourier algebra, we correlate the order of k-series with spatial parameters-i.e., the rank of neighboring interferences r, the sampling rate s, and the DOE depth h-for easily selecting the desired degrees of accuracy.
Assuming negligible multiple reflections of electromagnetic waves, this work introduces an effective dimension Λ eff ≡ (2r + 1)Λ ph to compute the fields for an assigned pixel, with the condition that the set-up can reproduce essential interferences' effects and simultaneously eliminate disturbances from non-physical periodicity in RCWA algebra. Moreover, since the spatial frequency u−m,v−n of every stacking layer in DOEs can iteratively transfer wave vectors e mn (h mn ) into different diffraction components e uv (h uv ), as indicated in Eqs. (8)-(11), simple criteria for spatial parameters can be given by rΛ ph ≥ hk ,R /k z,R and s ≥ ℓ − 1. Here k z,R is the selected highest-order k vector with a real-number z component, while k ,R = (k 2 x,R + k 2 y,R ) 1/2 is the in-plane component of the k vector defined for k z,R . The total components of k-series are selected in the range −(2r + 1)s/2 ≤ u, v ≤ (2r + 1)s/2 − 1 with Eqs. (5)- (7) at Λ x(y) = Λ eff in this work. The graphical interpretations of the optional parameters r and s are illustrated in Fig.  3.
In Fig. 3(a) the rank of neighboring interferences r defines the maximal pixel range of pixelpixel interferences by (Λ eff /Λ ph −1)/2, where the size of an equivalent-isolated pixel Λ eff should be infinite in theory for accuracy, but is set as a finite value by assuming insignificant interactions of fields among distant pixels. The computation hence requires only a finite morphology of nearby pixels within Λ eff to determine the local field in Λ ph , but still can keep predictive calculations. In Fig. 3(b) the sampling rate s defines the number of sampling points per pixel in one dimension. In this way, the k-series can represent an accurate sub-pixel profile of fields and enable occurrences of higher-order diffractions by reciprocity relations. With an adequate set-up of r and s values at given h, this work fulfills more efficient and accurate vectorial modeling of near fields in DOEs, in place of the typical wave front function in the scalar-based theory. Figure 4 illustrates schematic examples for computing the fields of the (red) pixel P(0,0) at different conditions (r,s) for KSAM, in which the lower part indicates the pixel morphology Υ (with total dimension Λ eff and Λ ph / = 1), and the upper part shows the profiles of fields |E x (x, y)| at z = + h . Numerical results state that one can perform predictive optical designs with proper dimension r (comparing Figs. 4(a) and 4(b)) and sampling data s (comparing Figs. 4(b) and 4(c)), at an affordable computational cost (92% reduction in computational time by a comparison of Figs. 4(a) and 4(c)). Here, the spatial domain size N is 1 for determining the (red) single-pixel fields by KSAM. To investigate the validity for general cases, we re-evaluate the analyses in Fig. 2 by KSAM and compare results with those by FDTD. Figure 5(a) investigates the average amplitude of different field components at the specified (central) pixel using r=s=2. Figure 5 are denoted as horizontal dotted lines near the corresponding regimes, respectively. Numerical results in Fig. 5 display qualitatively similar variations of fields to that by FDTD when varying Λ ph . With higher-order (r=s=4) approximations, Fig. 6 shows quantitatively similar curves to that by FDTD and clearly predict more accurate vectorial effects (e.g., depolarization and dephasing of fields) for systems studied.

Vector-based simulated annealing algorithm for designing diffractive optical elements
To next interpret the process for optical designs, this work exemplifies a Fourier-transformed DOE, where the coordinates of the image plane are denoted (x I ,y I ) and those of the exiting plane on DOE are denoted (x D,s ,y D,s ). DOE is assumed to be illuminated by a plane wave. The reconstructed image is derived in the focal (image) plane of the Fourier lens. During the design process, trial morphologies Υ ′ (x D , y D ) are specifically first generated with a simulated annealing algorithm, and the pixel-field profiles P ′ (x D,s , y D,s ) are then computed by means of KSAM. Here, the subscript s represents the index of sampling points per pixel. The DOE quality hence is scored by a cost function associated with the diffraction image of P ′ and the target image, in which the diffraction fields R x(y) in the image plane are given by R x(y) (x I , y I ) ⊂ F P x(y) (x D,s , y D,s ) Here, F −1 is the inverse Fourier transform, and the R z component is obtained with the relation k(x I , y I ) · R(x I , y I ) = 0. Moreover, DOE is considered as an optical element of N 2 square pixels, and the pixel-field profile P is an (N × s) 2 array function. In this work x I and y I are   simply assigned as the lowest N 2 -order wave vectors of the total (N × s) 2 ones, responding to an N 2 -pixel target image I 0 . The intensity I of the reconstructed image is given by: The cost or energy function to score performance of reconstruction is defined as the mean-square error between the reconstructed image and the target images I 0 , presented by: Here, is a scale factor given by: We note that the cost function is obviously zero when the reconstructed image equals the desired image.
Annealing cycle n = 1 : N A Pixel cycle x Dr = 1 : N, y Dr = 1 : The vector-based simulated annealing algorithm can be summarized by the following steps, and the flow chart is plotted in Fig. 7. 1. The pixel morphology Υ(x D , y D ) of DOE is initialized; e.g., according to the discretization of phase distribution of the Fourier transform of a desired image I 0 . The corresponding pixel-field profile P(x D,s , y D,s ) is then initially built by KSAM. The temperature parameter T 0 of the simulated annealing process is initially set at a relatively high value, enabling a large range of perturbation probability.
2. Each pixel Υ(x Dr , y Dr ) of DOE is successively selected for fluctuation during the pixel cycle (x Dr = 1 : N and y Dr = 1 : N). The morphology (or depth) of this selected pixel is reset by a random (trial) candidate of depth levels. The fields of pixel P ′ (i, j) in the range (x D − r ≤ i ≤ x D + r,y D − r ≤ j ≤ y D + r) are then updated by KSAM, where these pixels belonging to device boundaries are assumed to follow a periodic arrangement for simplification. We note that updating multiple pixel fields P ′ (i, j) for a single trial pixel Υ(x Dr , y Dr ) here can bring forth a factor of (2r + 1) 2 s 2 for the computation time.
3. Calculate the diffraction fields by Eq. (12). The difference in the cost function Δm=m newm now is then calculated by Eqs. (13)- (15). If Δm is negative, then the new pixel morphology is accepted; otherwise, acceptance or rejection is determined by the Boltzmann probability: where a random number 0 ≤ ≤ 1 is first generated, and the new (trial) pixel is accepted if < Θ(Δm, T) occurs and is rejected otherwise.
4. Repeat steps 2 and 3 for the pixel cycle (x Dr and y Dr ) and the annealing cycle (n = 1 : N A ); except when the criterion of convergence is achieved, where T in Eq. (16) is made lower with the number of annealing cycles by: An alternate Fourier algorithm for free-space propagation of vectorial optical fields, compared to Eq. (12), can be seen in [32]. The Matlab code concerning the first-order algorithm (r=1 and s=1) for designs of square DOEs is available in MATLAB Central File Exchange [33].

Two-stage optimization process for high-quality DOEs
To achieve high-quality DOEs, we introduce complementary optimization processes for deviations by hypothetical approximations. Specifically, for systems with microstructures at the subwavelength scale, field errors are primarily attributed to two issues: vectorial effects and approximation by the truncation of k-series. Thus, a two-stage optimization process is proposed in this work to deal with these two issues. To enable fast estimations, we employ a scaleddown sample of the desired product as a temporary prototype during the two-stage process. A schematic flow diagram is plotted in Fig. 8. Referring to Fig. 2, the vectorial optical fields in the studied regime 0 ≪ Λ ph / ≤ 1 noticeably depart from the prediction by the scalar-based theory. The optimized DOE depth, binary DOE for instance, could hence differ from the ideal value as well. For this reason, it is necessary to identify the optimal depth h by the vector-based theory in the first stage. By applying separate DOE designs Υ ′ at different depth value, the highest-quality one Υ ′ d and the optimized design depth h d are determined, as processes (2) and (3 ′ ) in Fig. 8. Here, the popular specification by the 0-order diffraction is adopted as the quality criterion. In the second stage, since the transmission fields by low-order KSAM deviate from that by the rigorous FDTD method (see Figs. 5 and 6), the design depth h d in theory can be inconsistent with the optimized value h p for products. To determine the optimal h p value, the design Υ ′ d is scored as a function of depth h by FDTD. It is concluded that the final DOE product Υ p is designed with v-SAA at h d and is manufactured with the set-up of depth h p for optimal performances. One can take it for granted that the value h d will asymptotically converge to the value h p in the high-order limit of the k-series approximation.

Depth cycle h v-SAA process
Next cycle

Design results and analysis
To evaluate the feasibility of the proposed algorithm, a beam shaper is adopted as an exemplary target. The simulation conditions are set by the plane-wave incidence,the wavelength = 1.0 m, the incident angle = 0 • , TM polarization, the refractive index of air n 1 = 1.0, the refractive index of DOE material n 2 = 1.5, the physical pixel size Λ ph = 0.6 m, the rank of neighboring interferences r=1, and the sampling rate s=1 position per pixel, for designing a 201 × 201 pixel array of binary DOEs. The default depth is set as h def = 1.0 by s-SAA, and its optimal value is re-evaluated by the two-stage optimization process as in Fig. 9. A 3D schematic of DOEs is shown in Fig. 1(c). The full structure of DOEs is represented by an image matrix, as in Figs. 10(a) and 11(a), where the cyan area indicates the material medium (n 2 ), and the blue area is the air-filling region (n 1 ). The beam shaper target is indicated as the inset mesh in Figs. 10(c) and 11(c). Before generating the full 201 × 201 design, a temporary 101 × 101 DOE is employed for speeding up the two-stage optimization process, so as to quickly determine the optimal design depth h d and product depth h p . At the first stage, with varying depth parameter h, a different DOE morphology is separately generated, and its intensity of zero-order diffraction is recorded for a quality score. Numerical results in Fig. 9(a) show that the optimal design depths are h d,sSAA = and h d,vSAA = 1.05 , corresponding to the binary morphologies Υ ′ d,sSAA and Υ ′ d,vSAA by s-SAA and v-SAA models, respectively. At the second stage, with given designs (Υ ′ d,sSAA and Υ ′ d,vSAA ), the zero-order intensities at different set-up of depths are directly estimated by FDTD, in order to determine the optimal depth h p for products. Figure 9(b) shows that the optimal product depths are h p,sSAA = h p,vSAA = 1.15 for this case. Here, the large zero-order intensity signifies the low-quality DOEs that are designed by the s-SAA model at a subwavelength feature scale.
With the known optimal depth parameter h d , the full 201 × 201 morphology of DOE can then be generated by SAA. Figure 10(a) illustrates the binary morphology of DOE designed by the s-SAA model with h d = 1.0 . The 4-thread computation over 100 annealing cycles runs in 160.9 seconds, which converges the variation of the cost function to below 0.01% per iteration.   Fig. 11(a). Similarly, the 4-thread optimization computation over 100 cycles runs in 1336.9 seconds, plus an extra time of 1.4 seconds for building the lookup table having ℓ (2r+1) 2 s 2 -size. To validate the designs, with given h p = 1.15 , the reconstructed images of DOEs are directly simulated by means of well-known softwares [34,35] that use the rigorous electromagnetic theory FDTD. Figure 10(b) shows the reconstructed image that corresponds to the design in Fig. 10(a), and its height plot in lower resolution is shown in Fig. 10(b'). The target pattern is put as an inset mesh in Fig. 10(b') for comparison. Here, rather than the typical intensity image I = |E| 2 , the reconstructed image is presented by the amplitude value of fields |E| for clarity. The FDTD program by 4-threads estimates the optics of DOE in 23389.1 seconds, in comparison with 1.6 seconds by s-SAA and 13.4 seconds by v-SAA. Numerical results indicate that the DOE designed by the s-SAA model exhibits low diffraction efficiencies, owing to the overwhelming zero-order intensity. Numerical results for the design by v-SAA are reported in Fig. 11. The corresponding reconstructed images are illustrated in Fig. 11(b) and 11(b'). Accordingly, we find that diffraction efficiencies as well as the design quality can be significantly enhanced by the vector-based method. Beyond the mentioned cases, designs with a finite device extent of, multi-level (ℓ ≥ 3), and/or multiple functional diffractive optical elements could also be generated by the presented model.

Conclusion
This research has described in detail a vector-based optimization tool for designing DOEs with subwavelength feature sizes. By introducing an adequate selection scheme of k-series, the algorithm can perform computationally efficient yet with predictive computations. Together with an optimization of the geometrical degrees of freedom (i.e., DOE depth h) as compensation for errors from the truncation of k-series, this work presents DOEs with diffraction efficiencies close to the theoretical limit. Furthermore, after implementing with lookup table techniques to pre-calculate pixel fields under ergodic couplings with nearby ones, the vector-based algorithm can achieve large-extent (mm-to cm-scale) designs in time frames comparable to those of scalar-based models.    Data availability. The Matlab code concerning the first-order algorithm (r=1 and s=1) in this paper is available in Ref. [33].