Whale Optimization Algorithm-Based Parallel Computing for Accelerating Misalignment Estimation of Reflective Fourier Ptychography Microscopy

Reflective Fourier ptychographic microscopy has much potential for industrial surface inspection due to the ability to overcome the physical limits of the numerical aperture of the optical microscopy. However, the time cost for misalignment calibration and Fourier ptychography (FP) recovery has been a big issue for industrial applications which require fast output. Here, we introduce a misalignment estimation method which is accelerated through the whale optimization algorithm by running in parallel on Central Processing Units (CPUs), named pWOA, to reduce computing time. The proposed method shows more accurate and faster calibration compared to other population-based algorithms, including the parallel genetic algorithm and the parallel particle swarm optimization, and much faster than that of the exhaustive search both in simulations and in real experiments. In addition, this cost-effective technique can address global non-convex optimization problems with heavy loss functions including reflective FP.


I. INTRODUCTION
F OURIER ptychographic microscopy (FPM) is an emerging computational phase retrieval technique which can achieve a large field of view (FOV) mage and a high-resolution enhancement simultaneously. Since it was first proposed in 2013 [1], a variety of research has been studied including resolution enhancement [2], [3], noise reduction [4], [5] misalignment calibration [6], [7], [8], [9], accelerating data acquisition speed [10], image reconstruction speed enhancement [11], etc. This research is mostly based on transmission-type FPM which focuses on measuring transparent and biological samples.
Unlike transmission FPM (tFPM), reflective FPM (rFPM) is able to measure reflective samples such as metallic surfaces like the silicon wafers that are widely used in the semiconductor and display industries. rFPM has two illuminators: a bright-field illuminator which has an illuminating angle that is smaller than the numerical aperture (NA) of the objective lens (NAobj) and a dark-field illuminator having an illuminating angle larger than the NAobj. Because of the two illuminators, rFPM has more complexity in terms of alignment compared to the tFPM. Until 2018, only a few papers [12], [13], [14] have been published about reflective FPM since it has difficulties in optical alignment and low signal to noise ratio (SNR). In 2019 [13], the low SNR problem was overcome by using a parabolic mirror which helps beams from the dark-field LED illuminator to effectively focus on the sample. Due to the complex design of the bright-field and dark-field illuminators, the reflective FPM suffers from misalignment issues and needs to be well-calibrated. If not, the FP synthesized image would be distorted or degraded. However, the problem is that it takes time to correct the misalignment [13]. Therefore, it is imperative to optimize the misalignment correction procedure to apply the reflective FPM as a real-time surface inspection tool in the industry. The FP recovery framework of rFPM is shown in Fig. 1.
The early misalignment error estimation method, named the position correction FPM (pcFPM) method, was introduced in 2016 by Sun et al. [6]. The simulated annealing (SA) optimization solver was conducted to find the optimal parameters that can minimize the error between the calculated low-resolution intensity images and the captured ones. However, due to processing over an entire image without segmentation, pcFPM was This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a time-consuming method and was replaced by a faster and more robust misalignment correction method called mcFPM [7]. This method first introduced a global misalignment model and solved it with a different iteration searching strategy, which helped to cut the computing time by tenths. In 2021 [15], the segment-independent misalignment calibration method for rFPM modifying the mcFPM of tFPM was suggested. According to the method, once finding the misalignment error of the finest segmented image, it can directly apply it to all of the segmented images. They used an exhaustive search method [16], known as brute force method, which uses a fixed step-size as the control parameter of the search range. However, there is a problem in that the method cannot guarantee the exact global optimization. In 2022, Zhu et al. proposed a space-based correction (SBC) method for a tFPM which utilized the particle swarm optimization algorithm to search the four misalignment parameters in the space domain [17]. Nevertheless, the SBC method required a lot of populations and iterations for searching the global optimization, taking a lot of time.
Inspired by the bubble-net hunting strategy of one of the smartest animals in the world, Mirjalili et al. developed a swarmbased meta-heuristic optimization algorithm, called whale optimization algorithm (WOA) in 2016 [18]. Similar to evolutionbased algorithms like genetic algorithm (GA), or swarm-based algorithms such as PSO, the WOA algorithm generates several populations and updates the locations of them based on the exploitation phase and exploration phase. On the one hand, the exploitation phase deals with local optima by creating the populations at the new locations around the optimal solution of the previous iteration based on the shrinking encircling mechanism or spiral update mechanism. On the other hand, the exploration phase assures the global optimum by randomly relocating the populations distanced from the populations of the previous iteration. Because of its ability to escape the local minima, the WOA has outstanding performance for multi-modal and composite objective functions. It also requires only a few operators while having a simple structure, a fast convergence speed, and a better balance between exploration and exploitation. Since revealed in 2015, the WOA has been improved and applied in many applications such as electrical engineering, computer science, applied mathematics, construction engineering, and aeronautical engineering [19], [20], [21].
Parallel computing is a time-saving technique thanks to the development of multi-core computation processing units (CPUs) and graphics processing units (GPUs). When applying parallel computing in GPUs to the misalignment and FP recovery problem of the FPM, the processing speed is increased by about 11 times [22], [23], [24]. The evolution-based GA and the swarm-based PSO also have been searched in parallel on the CPU in order to save time by running 100% efficiency of the multi-core CPUs [23], [24]. Applying the WOA to find the optimal misalignment parameters is a promising method but its parallel computing has not been well developed yet. Here, we proposed to optimize the misalignment computation problem while reducing the time by running the WOA in parallel on the CPU. In the following section, we first discuss the misalignment model of the reflective FPM and the parallel computing of the whale optimization algorithm (pWOA) method. To verify the performance of the pWOA algorithm, a simulation of the misalignment estimation using pWOA is performed and analyzed in Section III-A. And then in Section III-B, the proposed method is experimentally demonstrated with USAF and Siemens star-target, respectively. Finally, Section IV discusses the results and the achievement of the pWOA method. The list of symbols is presented in Table V.

A. Reflective FPM Mechanism
The reflective FPM setup consists of a bright-field illuminator (BI), a dark-field illuminator (DI), an objective lens, and a CCD camera, as shown in Fig. 2(a). The BI combines with a 4f imaging system to image the LED arrays on the back focal lens of an objective lens. The light waves travel from LED arrays of DI, reflecting in a parabolic mirror before reaching the sample's surface. The advantage of utilizing the parabolic mirror is that it increases the signal-to-noise ratio (SNR) [25].
For more efficient illumination [26], a ring-type arrangement of LEDs has been adopted for both BI and DI setups. Here, the BI consists of a center LED and two LED rings with 5 and 10 LEDs, respectively. The 72 LEDs of DI are distributed in three concentric rings. To ensure the convergence of object function during the phase retrieval process, the spectrum of an LED in the Fourier domain should have at least 40% overlap with the adjacent LED [27]. The rings' radius and the number of LEDs in each ring have been designed to satisfy the overlap requirement. Following the setup in Table III, the designed NA synth is 1.06, which is the sum of NA obj and the maximum NA illum .
The BI and DI position errors or the decentralizing of the parabolic mirror are the main factors of the misalignment error, which affects the reconstructed image quality. To model the misalignment, we consider that there are translational and rotational errors in the spatial domain of the BI (Δx B , Δy B , Δθ B ) and the DI (and Δx D , Δy D , Δθ D ) as shown in Fig. 2 Because of the aligning error, the spectral error vector Where Δk xjB and Δk yjB are the spectral error of the BI; Δk xjD and Δk yjD are the spectral error of the DI. Following the Fourier transform and translation and rotation formula, the location of the j th LED in the Fourier domain should be shifted due to the spectral error. The BI spectral error is defined as below: The DI spectral error is modified because the light is reflected by the parabolic mirror: The corresponding object function and pupil function of the FP recovery follow the phase retrieval process. In this research, we conducted the EPRY algorithm [28] to iteratively update the object and pupil function. Ψ i j is the updated segment spectrum based on the phase retrieval process: The goal of our work is to compensate for the misalignment error and to minimize the difference between the calculated intensity and the captured one. Most of the current algorithms for positional correction-such as the pcFPM [6] and mcFPM [7], [15] employed the mean squared error (MSE) as the loss functions. The optimization algorithm will find the misalignment values that minimize MSE as in (9), defined by ε j . The formula of the misalignment model is shown in Table I.

B. Whale Optimization Algorithm-Based Parallel Computing Method
The whale optimization algorithm was invented in 2016 by Mirjalili et al. to solve complex optimization problems and has shown its potential in various engineering fields. The algorithm requires a single control parameter, a time interval, to be fine-tuned. The population of humpback whales will search for food in the multi-dimensional space, which corresponds to the optimal values of the bright-field misalignment error Different misalignment estimations are used to present the locations of humpback whales, and objective costs of the WOA optimization are determined by the distance between food and humpback whales. The time-dependent locations of the whale individual are determined by three processes named encircling prey, bubble-net attacking (exploitation phase) and searching for prey (exploration phase) [18]. Seyed improved the mutation and crossover operations of WOA by hybridizing it with differential evolution to improve the exploration ability [20]. Recently, Sun has introduced nonlinear parameter and feed-back mechanism to enhance the global search process [21]. They all verified the performance of WOA by simulation.
Sauber et al. first introduced a parallel whale optimization algorithm based on the OpenMP library of the SMP approach in 2018 [29]. Afterwards, Alsayeji et al. prompted the Apache spark-based parallel processing WOA (SBWOA) to overcome the disk overheads by using resilient distributed datasets and showed higher computing performance [24]. However, in SB-WOA the population is divided into several smaller populations and then applying them to WOA. Due to the small population size, the global optimal value will not be assured. To overcome this issue, Sauber and Alsyeji selected a large total population size of 1000, which caused long calculation time. Moreover, they only tested the algorithm in a simulation. In reality, it is not practical for a misalignment estimation problem because one individual computation costs too much time. It is summarized in Table II as follows [18]: In this research, we conducted the parallel computation of WOA (pWOA) designed for the misalignment estimation problem. Multiple subtasks are performed simultaneously as part of one main task. The diagram of pWOA in MATLAB is shown in Fig. 3. The process consists of two main parts: parallel updating of the loss function of individual population, and updating the new location based on the whale optimization. On the one hand, in the t th iteration, the calculating loss function of population in a misalignment problem will take more than 98% of the calculating time due to bulk image processing. Therefore, it is critical to compute them in parallel to take advantage of CPU multi-cores. On the other hand, updating the new location of the (t+1) th iteration should be processed serially because it is not an independent process, but must follow the completion of the determination of the optimal location based on the minimum lost value in the t th iteration. Consequently, this approach can guarantee global convergence even if the total population size is small. This framework is highly applicable to the image processing problem where the calculating loss function is heavy.

A. Camera Simulation
To show the outstanding performance of the pWOA algorithm compared to the two state-of-the-art methods, a swarm-based optimization named the particle swarm, and the evolution-based genetic algorithm, we performed simulations with the wellknown "cameraman" image and the setup parameters listed in Table III. Firstly, we set the position error of BI and DI with a specific number to see the accuracy. Secondly, the simulation was added some amount of defocus aberration to see the stability of pWOA in rFPM. As mentioned by Lee et al. [10], the misalignment method has a tolerance for a small amount of defocus aberration while the positional misalignment is corrected at the same time. The calculating step begins with estimating the misalignment of the bright-field (BF) illumination, and then estimating the misalignment of the DF illumination. The parallel genetic algorithm (pGA), parallel particle swarm optimization (pPSO), and parallel whale optimization algorithm (pWOA) were set with the same parameters; the population size was 5, 10, 15. To see the performance of pWOA, we continue iterating the optimization until 100 iterations. The simulation was run in a MATLAB environment.
After estimating the misalignment error, the simulation was performed via the adaptive step-size FPM (ASFPM) based FP recovery. This method accelerates the full FOV HR image reconstruction by combining GPU computation and a smart optimization strategy which uses the incremental gradient decent with adaptive step-size [22]. In digital image processing and analysis, structural similarity (SSIM) is used as a measure that estimates how similar two images are in terms of their spatial structural information [30].
where x and y are two images that are aligned with each other, and μ x , μ y represent the mean and δ x , δ y are standard deviation of the image vector x, y, respectively. To avoid the appearance of instability when these statistics are close to zero, constants C 1 and C 2 are included. The SSIM index ranges from 0 to 1, with a higher value indicating more structural similarity between two images. As shown in Fig. 4(a) and (b), the pWOA algorithm outperforms when a small number of population sizes, five and ten, are used. After about 30 iterations, the SSIM convergence value of pWOA is bigger than those of pGA and pPSO. In addition, because of the small population size required, the pWOA finds the misalignment in much shorter time. Fig. 4(c) depicts the pPSO's outperformance with a population size of 15. Comparing the SSIMs of pWOA in Fig. 4(a) and (b) with the SSIM of pPSO in Fig. 4(c), we concluded that using a small population size of five with pWOA saves the calculation time of 3.79 s to reach the global convergence value of the SSIM of 0.8461. On the other hand, using a large population size of 15, pPSO needed 7.3 s to achieve the same SSIM convergence value. Fig. 5 shows that the objective function of the misalignment calculation problem has a multi-mode function. Mirjalili et al. [18] proved that WOA works more effectively for multi-mode objective function than

B. Experiment Setup
To observe the performance of our method with real experimental data, we equipped the reflective FPM with BI, DI, camera, objective lens, and a parabolic mirror as shown in Fig. 6. The BI and DI were fabricated with the same design parameters mentioned in the Section II-A. The BI includes a BF LED array, 4f imaging system, and a beam splitter. The beams from the BF LED array are imaged on the back focal plane of the objective lens through a 4f imaging system, which consists of two convex lenses and a field stop. The specification of every component used in the reflective FPM is listed in Table IV. The LED numbers in each ring and the placement of LED arrays can assure the overlaps of 40% of the spectral of the adjacent LED region in the Fourier domain [25]. The central wavelength of the LED is 515 nm.
We first utilized the USAF target to the real experiment to achieve the 88 raw images. The input of the misalignment model was 34 images with a 54 × 54 pixel size which are 16 images obtained from the BF LEDs and 18 images obtained from the DF LEDs. After estimating the misalignment error with the pWOA, pPSO, pGA optimization methods, the FP recovery images were achieved by applying the ASFPM method to the 88 total raw images. We set up the hyper-parameter of 16 population size and 100 iterations for three population-based estimators to check their global optimization.
The recovered FP high resolution (HR) images are displayed in Fig. 7. Fig. 7(a) is the full field-of-view (FOV) high-resolution amplitude of the pWOA method, and Fig. 7(b) is the enlarged segment of the red region in the full FOV on-axis raw image that is used to calculate the misalignment of the LED arrays. Fig. 7(c), (d), and (e) are the enlargements of the red region in Fig. 7(a), which correspond to the results of the pWOA, pGA, and pPSO, respectively. Fig. 7(f), (g), and (h) are the phase images in the same red region in Fig. 7(a), corresponding to the pWOA, the exhaustive search, the pGA, and the pPSO, respectively. In Fig. 7(c), we can see the pattern in Group 11 and Element 1, which means the 488 nm full pitch is resolved. However, compared to Fig. 7(c), the regions pointed out by the red arrows in Fig. 7(d) and (e) are blurrier and less clear. The problem can be explained by the fact that the misalignments estimated by pGA and pPSO are located at a local optimal point or have not yet converged. Normally, the GA and the PSO require many population sizes and iterations to reach the global optimum. In this experiment, pWOA needs only 16 populations to reach the global convergence, but pGA and pPSO do not reach the global convergence with the same populations with the same setup. It is in good agreement with the convergence plot in the simulation. Furthermore, the recovered FP-phase images show In the case of applying the pWOA algorithm in Fig. 7(c), it took 7 seconds to reach convergence with the hyper-parameters of 16 population size and 30 iterations. The exhaustive search took 27 seconds to produce a comparable result [22], which is approximately 3.8 times longer than the time required by pWOA to estimate the misalignment error. In conclusion, using the proposed method is the best choice in terms of the recovered FP image quality and time consumption among the other optimizers.
Because the USAF target can only show discrete performance related to resolution, we could not see the exact resolution of each misalignment estimator. Therefore, we conducted an additional measurement with a Siemens star-target, which consists of spoke patterns, to compare the exact resolving capability between the different misalignment estimation methods. The reflective FP microscopy design was modified from the original design in order to examine the performance under different experiment conditions. The LED arrays were reduced to 9 BF illuminators and 16 DF illuminators with a central wavelength of 515 nm. The objective lens was changed to a magnification of 50× and 0.75NA.
The reflective FPM produced 25 raw images of the star-target. The pWOA, pGA, and pPSO estimators used 25 segments cropped with a size of 200 × 200 pixels from the center of raw images as inputs. The size of the population was selected as five. After estimating the misalignment error by iterating 100 times with the different methods, the FP recovery images were achieved by applying the AdamML [30] method to the 25 total raw images. Enlargements of the star target's center amplitude, corresponding to pWOA, pGA, and pPSO results, are shown in Fig. 8(a), (b), and (c), respectively. Fig. 8(d) depicts a plot of the amplitude along the red circle segment in Fig. 8(a) at the 169 nm spoke period. Fig. 8(e) and (f) are the plots of the amplitude along the red circle in Fig. 8(b) at 176 nm spoke period and in Fig. 8(c) at 200 nm spoke period, respectively. As a result, we can see that the resolution of the FP recover image using the pWOA misalignment method is better than those obtained using the pGA and the pPSO estimators. Because of the incorrect misalignment value obtained using the pPSO method with a small population size, the quality of the amplitude image in Fig. 8(c) was far worse than that in Fig. 8(a). When we stopped after 30 iterations, the performance of pWOA was not degraded because it reached the global optimum within 6.7 s.

IV. CONCLUSION
To achieve a full FOV HR image while dealing with the bulk data of low resolution (LR) images, the FPM needs a lot of time to calibrate the misalignment of LED arrays. Youquiang et al. [17] utilized the space-based correction method based on the PSO algorithm that took 56 seconds for estimating the misalignment of BI consisting of 225 LEDs in the transmission FPM. In our previous work, we reduced the time by using an exhaustive search run in GPU to 27 seconds in the reflective FPM with two illuminators consisting of 88 LEDs [22]. Nonetheless, about 30 seconds for the misalignment correction was still a major hurdle for FPM to be employed as an in-line measuring instrument in the industry.
In this paper, we introduced a novel method, which accelerates the misalignment calibration of illuminators in a reflective FPM system while maintaining the resolution and quality of the reconstructed image. The novel method consists of three main points. First, the calibrating misalignment is re-constructed to the optimization problem with the LR images as the input, the BF and DF illuminators' misalignment values as the output. The objective function is to minimize the total error of computed LR images and the real ones. The second point is to apply a new invented swarm-based optimization method, named whale optimization algorithm, to find the global optimal parameters. On the final point of applying the WOA algorithm to the FPM calibration problem, the parallel processing of WOA on the CPU is designed to accelerate the finding of an optimal solution. After quickly estimating the misalignment values by the proposed method, the full FOV HR image is reconstructed by the ASFPM method [22].
Compared to pGA, and pPSO in the cameraman simulation, pWOA showed the ability to quickly find out the global optimal misalignment values and achieved the best reconstruction image quality, similar to the theoretical prediction [16]. This ability of the pWOA algorithm was also successfully verified by the USAF target experiment, which could achieve a half pitch resolution of less than 250 nm with seven seconds of the misalignment calibration time. This result showed that the misalignment estimating time of the reflective FPM was remarkably reduced from 27 seconds [22] to seven seconds, about 3.8 times faster, while keeping the same HR image quality. Finally, pWOA got the highest resolvability among the other population-based methods in the measurement of the Siemens star-target, a continuous spoke pattern. This result facilitates opening the doors to a bright future of applying the reflective FPM to real-time surface inspections of semiconductors in the industry. In addition, the proposed method is not restricted to reflective FPM, but can also be applied to the transmission FPM and further to any other non-convex global optimization problems while saving computing time.