Aberration-corrected three-dimensional positioning with a single-shot metalens array: supplement

WENWEI LIU,1,† DINA MA,1,† ZHANCHENG LI,1 HUA CHENG,1,5 DUK-YONG CHOI,2 JIANGUO TIAN,1 AND SHUQI CHEN1,3,4,∗ 1The Key Laboratory of Weak Light Nonlinear Photonics, Ministry of Education, School of Physics and TEDA Institute of Applied Physics,Nankai University, Tianjin 300071, China 2Laser Physics Centre, Research School of Physics, Australian National University, Canberra, ACT 2601, Australia 3The Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China 4Collaborative Innovation Center of Light Manipulations and Applications, Shandong Normal University, Jinan 250358, China 5e-mail: hcheng@nankai.edu.cn ∗Corresponding author: schen@nankai.edu.cn †These authors contributed equally to this work.


SAMPLE FABRICATION
The TiO 2 metasurface was fabricated by using the spin-coating technique to deposit a 550-nm-thick electron-beam resist (ZEP 520A from Zeon, Japan) on a cleaned glass substrate, which as baked on a hot plate at 180 °C for 1 min. To prevent charging during electron-beam writing, a thin layer of the E-spacer 300Z (Showa Denko) was coated on the resist. Then, the nanostructures were defined by electron-beam lithography (Raith150) at 30 kV with 20-pA current, followed by development in n-amyl acetate solvent. Conformal TiO 2 layers with a thickness of approximately 70 nm were deposited by an atomic layer deposition system (Picosun) using titanium tetrachloride and H 2 O as precursors at a reactor temperature of 130 °C. Subsequently, CHF 3 plasma in an inductively coupled plasma-reactive ion etching (ICP-RIE, Oxford system 100) was performed to blank the TiO 2 layer until the ZEP 520A resist was exposed. Here the etching conditions were 30 sccm of CHF 3 at 50 W bias power/500 W induction power at an operating pressure of 10 mTorr, which resulted in a TiO 2 etching rate of ~20 nm per minute. Finally, O 2 plasma was used to fully remove the remaining resist.

CROSS-CORRELATION-BASED GRADIENT DESCENT (CCGD) ALGORITHM
We employed a CCGD algorithm to correct the imaging aberrations, achieve positioning, and obtain the reconstructed images. Four parameters characterizing the properties of imaging were utilized in the algorithm, that is, k 1 , k 2 , t x , and t y , where k 1 and k 2 are the parameters to correct the distortions of the image, and t x , t y are translation parameters between different image parts. The algorithm can be summarized in the following steps: (1) Generating the image matrix to minimize the impact of the gray information in the images, we first use the function "adaptiveThreshold()" from the Open Source Computer Vision (OpenCV) library [1,2] to binarize the image. This function divides the image into small blocks and performs the thresholding calculation.
(2) Initial distortion correction. The radial transformations of the image can be corrected using Eqs. S1 and S2 by two parameters k 1 and k 2 , which will be optimized during the iteration in step (4).

( )
where (Δx, Δy) is the correction of the distorted image, (x r , y r ) are the coordinates of a pixel in the image with the original point located at the distortion center, and r is the distance between the pixel and the distortion center. Then, we divide the image into three parts, Part I, II, and III, corresponding to three matrices P 1 , P 2 , and P 3 .
(3) A cross-correlation method to determine the affine transformation. P 1 , P 2 , and P 3 describe the same image with different imaging information. Taking P 1 and P 3 as an example, the affine transformation between the coordinates of P 1 and 1 P′ can be described by Here 1 P′ is a new image part that should overlap with P 3 . Because 1 P′ and P 3 contain different imaging information, the optimization method introduced in Step (4) is utilized to optimize the overlapping.
We employ a cross-correlation method to determine the value of t x and t y during each iteration, which combines the overall information of an image instead of finding featured points. Consider two images related to The cross-correlation between P a and P b in real space is defined by cross-correlation function in real space can be further simplified as The maximum value of ( ) r ccf R is located at (t x , t y ), which characterizes the translation relationship between P a and P b .
(4) Calculate the loss function and optimize. The fitness function can be calculated by the mean distances between different image parts: By updating and optimizing the parameters in J - , that is, the gradient of the objective function J, the objective function J can be minimized to obtain the aberration-corrected image and the positioning parameters. To improve the convergence efficiency, we also employ the adaptive moment estimation (Adam) optimizer [3] to update the parameters.
(5) Continue the iteration until the fitness functions converge. Generally, convergence can be achieved within 150 iterative cycles in our study.

SENSITIVITY OF POSITIONING WITH THE METALENS ARRAY
The proposed principle can realize three-dimensional positioning of the target object by analyzing the two-dimensional information of the aberration-corrected images.
Compared with positioning via mechanical platforms, this method relies on the optical intensity of the images, which is only fundamentally limited by the Abbe diffraction limit. As demonstrated in the main text, the object distance and the horizontal displacement can be calculated as where we replaced Δ with Δ 1 compared with that in the main text to distinguish from the differential operator Δ. Accordingly, the relative sensitivity of positioning can be further calculated by It can be seen that the resolution of positioning is determined by the accuracy of the measured geometric length in the image plane. By employing improved algorithms, utilizing other metalens designs to directly revise the aberrations, or by adopting a high-accuracy translation stage, the positioning accuracy can be further improved.
The relative sensitivity of positioning can also be described by the metalens design and the geometric parameters in the object plane: where ΔΔ 1 is the measured accuracy of the images in the image plane, and can be estimated by the accuracy of the periodicity of the images ΔD (also measured in the image plane). Thus, Eq. S10 can be further written as Because the measured geometric accuracy in the image plane is settled for a metalens array, the vertical and horizontal positioning accuracy both decrease for large object distances, which is a universal result for positioning systems [4].

NUMERICAL SIMULATIONS
The focusing properties of the metalens were determined using the finite-difference time-domain method with a Gaussian beam incidence for different incident angles.
The optical indices of the TiO 2 nanopillars and fused silica substrate were obtained from Palik's handbook [5]. We simulated the electric distribution at a plane at a distance of 2 μm above the metalens and employed far-field extraction to obtain the light distribution at the focal plane. To further reduce the simulation occupancy of resources, we simulated a smaller metalens with a diameter of 20 μm, but with the same numerical aperture NA = 0.50 as that of the fabricated metalens. The size of the smaller metalens is still dozens of times larger than the operating wavelength, which means that the near-field effects are negligible and the diffraction effects dominate the focusing properties (the same as the metalens introduced in our scheme).

MEASUREMENT PROCEDURE
The light source in the experiments was a Mercury-Xenon lamp (Thorlabs, SLS402) with a liquid light guide (Thorlabs, LLG5-4Z) as the output. After a 10 nm-bandpass filter centered at 532 nm, the polarization of the incident light was controlled by a linear polarizer and a quarter-wave plate pair to obtain the LCP incidence (see object. An objective (Obj1, Sigmakoki EPL-5, 5×, NA = 0.13, WD = 11.6 mm) was employed to collect the incident light onto the target object S1. In our setup, we created a drawing of a bee on a thin Cr film as S1 with the laser direct writing method [6], and the target sample did not have specific limitations in our design. The image of S1 was zoomed out by another objective (Obj2, Sigmakoki EPLE-50, 50×, NA = 0.55, WD = 8.2 mm) and then imaged by the metalens array. In our setup, the actual object, which is positioned by the metalens array (S2), is the zoomed-out image behind Obj2.
The filter, P1, QW1, diffuser, Obj1, S1, and Obj2 were mounted on an electric The optical positioning setup can be adjusted through the following steps: First, the metalens sample S2 was imaged by the scanning system. By adjusting TS1 along the vertical direction (z-axis), the scanning system on TS2 can simultaneously image the target object and S2 in the FOV (see Supplementary Fig. 6 for details of the captured image), which is the initial state to realize positioning. At this step, sample S2 is located at both the working planes of Obj2 and Obj3, and the object distance of S2 is zero. Then, by moving TS2 along the +z direction and moving TS1 along the − z direction, one can image the target object again when satisfying the Gaussian lens 9 formula, 1/S 0 +1/S 1 = 1/f metalens . At this step, the captured image is the image of S2. In realistic implementations, the optical setup can be simplified in the following aspects.
In Fig. 4, we demonstrated that P1 and QW1 could be removed, maintaining comparable imaging/positioning performances. The filter could be integrated on S2, such as coating a bandpass film. The lamp, diffuser, and S1 could be replaced by an object with diffuse reflection. Objective Obj1 was used to collect the incident light and to increase the signal-to-noise ratio, which is also not fundamentally necessary.
On the other hand, if one does not need to directly measure the object distance, Obj2 and TS1 can also be removed for larger metalens with large focal length and long object distance.