Fourier ptychographic microscopy using a generalized Anscombe transform approximation of the mixed Poisson-Gaussian likelihood

: Fourier ptychographic microscopy (FPM) is a novel computational microscopy technique that provides intensity images with both wide ﬁeld-of-view (FOV) and high-resolution (HR). By combining ideas from synthetic aperture and phase retrieval, FPM iteratively stitches together a number of variably illuminated, low-resolution (LR) intensity images in Fourier space to reconstruct an HR complex sample image. In practice, however, the reconstruction of FPM is sensitive to the input noise, including Gaussian noise, Poisson shot noise or mixed Poisson-Gaussian noise. To e ﬃ ciently address these noises, we developed a novel FPM reconstruction method termed generalized Anscombe transform approximation Fourier ptychographic (GATFP) reconstruction. The method utilizes the generalized Anscombe transform (GAT) approximation for the noise model, and a maximum likelihood theory is employed for formulating the FPM optimization problem. We validated the proposed method with both simulated data for quantitative evaluation and real experimental data captured using FPM setup. The results show that the proposed method achieves state-of-the-art performance in comparison with other approaches.


Introduction
In microscopy, a large space-bandwidth product (SBP), which characterizes the total resolvable pixels of an imaging system, is highly desirable for biomedical applications. Naturally we can increase the SBP by scaling the size of the lens. However, this solution will lead to a compromise between achievable resolution and the field of view (FOV). To address this problem, Zheng et al. [1] proposed Fourier ptychographic microscopy (FPM), which is capable of providing a scalable SBP for most existing microscopes without the compromise. In this method, illumination angles are scanned sequentially with a programmable LED array source, while taking a lowresolution (LR) image at each angle. Assuming that illuminating a thin sample by an oblique plane wave is equivalent to shifting the centre of the sample's spectrum in the Fourier domain, each off-axis LED shifts different amounts of high spatial frequency information, diffracted from the sample, into the acceptance angle of an objective lens [2]. By capturing a stack of LR images that cover a wide region of Fourier domain and stitching them together coherently, one can achieve spatial resolution beyond the objective's diffraction limit, corresponding to the sum of illumination and objective numerical aperture (NA) [3].
Theoretically, FPM shares its roots with aperture synthesis [4][5][6][7][8] and phase retrieval [9,10]. The aperture synthesis technique was originally developed for radio astronomy, aiming at bypassing the resolution limit of a single radio telescope. Likewise, FPM is able to bypass the frequency limit of the employed objective lens. The desirable property of FPM is largely contributed to the successful exploring of phase retrieval, which is able to recover the lost phase information by using intensity-only measurements. Conventional FPM applications [1,3,11] utilize the alternating projection algorithm (AP) [9,10], a widely used method for phase retrieval, to implement the reconstruction process. However, the AP algorithm utilized in the reconstruction process is sensitive to the input noise [12]. To tackle measurement noise, Bian et al. [13] proposed a Wirtinger flow [14] optimization for Fourier ptychography (WFP), which incorporates priors on the measurement noise and employs the gradient-descent scheme to minimize the cost function that describes the differences between actual and predicted measurements. WFP could obtain satisfying results compared to AP, and realize FPM reconstruction using low signal-to-noise-ratio (SNR) images. However, it needs careful initialization since the optimization is non-convex and is not effective when there exists noise with a high level. To improve the robustness of the reconstruction result, Ou et al. [15] proposed an Embedded pupil function recovery method to correct the aberration in the imaging system. Based on the nonlinear optimization algorithm for phase retrieval [16], Zhang et al. [17] applied a nonlinear scheme to FPM, where the pupil aberration of the imaging system can be corrected without a complicated calibration process, and consequently achieves better performance for the system uncertainties. However, it is difficult to determine the best nonlinear factor for the reconstruction procedure. As reported in [18], the Gaussian-Newton method is applied to reconstruct high-resolution (HR) image, and the data acquisition efficiency can be enhanced significantly due to coding strategy. Besides, Tian et al. [18] added a new procedure for background estimation and regularization for tuning noise performance. Based on the cost functions of existing FPM algorithms, Yeh et al. [3] tested amplitude-based algorithms and intensity-based algorithms. The results show that amplitude-based Newton's method gives a better reconstruction but needs much more running time for the reconstruction. Horstmeyer et al. [19] formulated a convex program for the FPM reconstruction. The method can always obtain a solution with global optimum. However, the method is limited in running efficiency which makes it impractical for real applications. Recently, Bian et al. [20] proposed a FPM reconstruction method termed as truncated Poisson Wirtinger Fourier ptychographic (TPWFP) reconstruction. The technique incorporated Poisson maximum likelihood objective function and truncated Wirtinger gradient [21] together into a gradient-descent optimization framework. However, this method is targeted for Poisson noise, and neglected the more general distributions of noises, e.g. Gaussian noise, Poisson-Gaussian noise.
Based on the above introduction, we can conclude that either Gaussian or Poisson noise model is adopted to deal with the FPM dataset. However, many imaging devices, such as CCDs and photographic plates, capture images by successive photon-to-electron, electron-to-voltage, and voltage-to-digit conversion [22]. This capturing process is subject to various signal-dependent errors, and a standard way to model these errors is to consider them as Poisson-Gaussian noise [22]. Specifically, photon emission and sensing are inherently random physical process, which in turn substantially contributes to the randomness in the sensor output. Thus, the noise model employing a Poisson component can be used to account for this signal-dependent uncertainty. Complementarily, the additive Gaussian component accounts for the other signal-independent noise sources involved in the capturing chain, such as thermal noise [22]. Consequently, it would be desirable to consider both Poisson and Gaussian noises simultaneously. Regretfully, there is no algorithm that can both account for both sources of noise for now. To address this problem, this paper proposes a noise model that considered both sources of noise, which will be a more robust and more exact choice for noise-sensitive FPM algorithm. We obtained the noise distributed model taking both Gaussian and Poisson noises into account (see Eq. (6) in Sec. 2.3. for detail). However, it would be rather complicated to directly deal with the mixed Poisson-Gaussian model since every sample exhibits an infinite Gaussian mixture distribution. In order to tackle this problem, we introduced Generalized Anscombe Transform (GAT) to the noise model and obtained the solution. We term the proposed method as Generalized Anscombe Transform approximation Fourier ptychographic reconstruction (GATFP), incorporating the gradient based likelihood objective function and GAT approximation together in FPM optimization framework. Actually, a robust method for FPM is strongly desired for the practical application of FPM and other Fourier ptychographic applications, such as digital pathology and haematology. Therefore, the proposed method would promote the practical application of FPM.
In summary, the noise model proposed for FPM is new and the solution method is significative for the practical application of FPM. GATFP has several practical contributions: • The mixed Poisson-Gaussian (PG) is more appropriate to model capturing process in real imaging systems. Based on PG model, we can obtain better results in practical applications.
• GAT is used for stabilizing the noise variance and generalizing various experimental noise types, which plays an integral part in ensuring accurate results.
• The proposed method largely promotes the practical application of FPM.
To illustrate the effectiveness of GATFP, we run GATFP as well as the aforementioned state-of-the-art algorithms on both simulated and real captured data. Our results show that GATFP reconstructs more accurate results from noise-deteriorated inputs involving Poisson noise, Gaussian noise and mixed Poisson-Gaussian noise, and is more robust compared with other state-of-the-art algorithms.

Fourier ptychographic image formation
In this section, we briefly review the image formation of FPM from an optical standpoint. We assume that each LED acts as an effective point source and illuminates a sample s(r), where r = (x, y) represents the 2D spatial co-orinates in the sample plane, by a plane wave from a different angle, defined by exp(i2πk j · r), with k j = (k j,x , k j,y ) being the spatial frequency corresponding to the j th LED. Additionally assuming the sample is thin, we may express the exit wave as the product of the sample and illumination complex fields, s(r)exp(i2πk j · r). The tilted plane wave illumination means that the Fourier transform of this exit wave is just a shifted version of the Fourier spectrum of the object, S (k − k j ), where S (k) = F {s(r)} and F is the 2D Fourier transform [1,3]. This exit wave is then modulated by the imaging system's pupil function P(k), which acts as a circular low-pass filter [19]. Finally, the wave propagates to the detector imaging plane and forms a limited-resolution image, such that where I j is the intensity of the image captured under the illumination of the j th LED, F −1 is the 2D inverse Fourier transform. For multivariate problems such as Eq. (1), it is convenient to reformulate the problem using linear algebra. Similar to Yeh [3], we develop a linear algebraic framework to illustrate the Fourier ptychographic image formation process. First, we represent each of the captured images, I j (r), having m × m pixels, as a vector I j sized of m 2 × 1. Since the estimated object transmission function will have higher SBP than the raw images, the estimated object should have n × n pixels, with n > m. For convenience, we actually solve for the Fourier space of the object, S (k), which is a vector S sized of n 2 × 1. Before multiplying the pupil function, the Fourier space of the object is downsampled by an m 2 × n 2 matrix Q j . The matrix Q j transforms an n 2 × 1 vector into an m 2 × 1 vector by selecting values out of the original vector. So the entries of matrix Q j are either 1 or 0 and each row contains at most one nonzero element. Second, the pupil function P(k) becomes a vector P sized of m 2 × 1. Finally, we introduce an m 2 × m 2 matrix F as the discrete Fourier transform operator and F −1 as the inverse transform operator. The image formation model (1) can be finally vectorized as: where the diag(·) represents a diagonal matrix. In real applications, FPM acquires a series of LR images, I j , {1 ≤ j ≤ N}, N is supposed to be the number of captured images. According to the work of Horstmeyer et al. [19], we combine the image set into a single vector by stacking all images in Eq. (2) as: where all the ideal captured images are formulated into a vector b with size N · m 2 × 1,F −1 is a (N · m 2 × N · m 2 ) block diagonal matrix containing N copies of the matrix F −1 in its diagonal blocks and Q is formed by vertically stacking each Q j with size N · m 2 × n 2 . Likewise,P is a (N · m 2 × N · m 2 ) block diagonal matrix. The formation of b,F −1 ,P, Q and S is illustrated in Fig. 1 for easy understanding. In addition, we define the sampling matrix D =F −1P Q with size N · m 2 × n 2 .

Variance stabilization with GAT
Let z be the observed image obtained through an image acquisition device in vector form. We model z as an independent random Poisson signal and corrupted by additive Guassian noise. In other words where z q is the q th measurement, b q represents the signal related to b q (the q th block of b). Additionally, b q ∼ Poisson(b q ) and n q ∼ N(0, σ 2 ) are supposed to be mutually independent random vectors. Then we can apply GAT [23]: to z q in order to (approximately) stabilize its variance to unity. Note that for the pure Poisson case (σ = 0, µ = 0), this coincides with the traditional Anscombe transformation [24] used for stabilizing data corrupted by Poisson noise [25].

Poisson-Gaussian maximum likelihood estimation
According to the analysis mentioned above, the likelihood function of z q is given by [26]: where, for every i ∈ 1, · · · , m 2 , [b q ] i and [z q ] i denote the i th component of b q and z q , respectively. In addition, we can rewrite [b q ] i as: where [D q ] i is the i th row of D q . The efficient and robust reconstruction of FPM is highly desirable for the applications of FPM such as digital pathology and haematology. However, directly applying Eq. (6), the gradient of the likelihood function would be very complicate, since every sample exhibits an infinite Gaussian mixture distribution in Eq. (6). Even though we obtain the complicated gradient painstakingly, it will inevitably increase the processing time of FPM due to the iterative procedure, which strongly limits the practical application of FPM. To derive a solvable solution, we employ the GAT to Gaussianize the data, based on the inspiration of variance stabilizing transform in [27]. Applying GAT, each sample is near-normally distributed with an asymptotically constant variance, and we can easily obtain an approximated solution.
Using the GAT approximation, the likelihood of z q is approximately given by [25]: Assuming that the measurements from all pixels are independent, we can calculate the likelihood function. In maximum likelihood estimation, the goal is to maximize the likelihood function as: It is easier to solve this problem by turning the likelihood function into a negative log-likelihood function which can be calculated as: (10) Clearly, z q is related to z q through Eq. (5) and thus unchanged in the optimization process. Additionally, the components of z q represent the measured pixel value and thus meet the first case in Eq. (5). By omitting the constant terms and replacing z q with Eq. (5), we can get the optimization model of GATFP for FPM reconstruction as:

Robust update procedure
As stated before, we aim to minimize Eq. (11) by estimating S so that the overall probability is maximized. To minimize Eq. (11), the gradient of the cost function needs to be calculated.
Inspired by previous works [14,20,21], we can obtain the gradient of f (S) as: where [D q ] H i denotes the conjugate transpose of [D q ] i . As analyzed in [20,21], the gradient of this form is non-integrable and hence uncontrollable, which might not come close to the true S. Instead, Truncated Wirtinger Flow addresses this challenge by introducing some appropriate truncation criteria. The truncation criteria is specified by: where α h is a predetermined parameter specified by users, |z q − D q S| 2 is the difference between the q th measurement and its actual value, ||z−|DS| 2 || 1 N is the mean of all the difference, and |D q S| 2 ||S|| represents the relative value of the transformation [20]. The value of ζ q (S) is one or zero. one stands for the pixel that satisfies Eq. (13), and zero stands for the contrary. Then we define the truncated gradient as [21]: With the expression for the gradient given by Eq. (14), the value of gradient located on the ζ q (S) position of one is remained, the gradients of others are set to zero.
The reconstruction procedure starts with a pupil function P guess, which is set as a circularly shaped low-pass filter, and a sample function S guess, which is the spectrum of the upsampled low-resolution image. Then we could update S iteratively with the gradient-descent routine as: where β is the gradient step size set by users. When the difference of quantitative metric between adjacent two iterations is blow 0.01, it is assumed the algorithm achieve convergence. Similar to [13,14], the setting of β is assigned as: where we set k 0 = 500 and β max = 0.5. This type of gradient-descent step suggests that in the early iterations we should use a small step size as the noise is large since we are not yet close to the solution. However, as the iteration count increases and we make progress, the size of the noise decreases and we can pick larger values for the step size [14]. In addition, we note that the step size generated by traditional line search algorithm [15,18] can also obtain good reconstruction in some special case. When the step size is big and the noise is small, the result using traditional line search algorithm would perform well. But the robustness of reconstruction would decline when the step size is big. On the other hand, when the step size is too small, the quality of reconstruction is poor and the speed of convergence is slow. Take all factors into consideration, we choose the step size as Eq. (16) in our paper. Actually, Eq. (15) only renews the sample spectrum while keeping the pupil function unchanged. In numerical simulations, we have the true pupil function, thus an iterative phase retrieval algorithm as in Eq. (15) can be utilized to find S that satisfies Eq. (11). However, a precise pupil function is not available aforehand in the real captured dataset, and an imprecisely estimated pupil function will result in a poor recovery. Thus, we can recover both the functions S (k) and P(k) that satisfy Eq. (11) for all N's measured images as follows: where β S and β P are the step sizes for spectrum and pupil function, respectively. Note that the step sizes have not to be different. We repeat the iterative updating rules in Eqs. (17)(18) until it converges in real experimental reconstruction.

Results
In this section, we conduct a series of experiments on both simulated and real captured data.

Quantitative metric and parameter settings
In order to quantify performance under various experimental settings, we introduce the relative error (RE) [20,21] of the reconstruction, defined as: where S t is the true sample spectrum, and S denotes the reconstructed spectrum.
In the simulation experiments, we use the 'Mandril' as the HR amplitude, the 'Aerial' image (512×512 pixels) from the USC-SIPI image database [28] is used as phase image. The parameters in the simulation are chosen to realistically model a light microscope experiment, with an incident wavelength of 630nm, CCD pixel size of 1.845µm and an objective NA of 0.1. We use a 15 × 15 LED matrix as the light source to provide angle-varied illuminations. The distance between  adjacent LED elements is 4mm, and the distance between the sample and LED matrix is 90.88mm. The simulated thin object is illuminated with plane waves from 225 different incident angles. Based on the Fourier ptychographic imaging formation, we simulated the ideal data without noise by following steps: • We apply FFT to the original HR image, and select different sub-regions in the Fourier domain by multiplying the HR spectrum with a circularly shaped low-pass filter (128×128).
• We perform inverse Fourier transform to recover the LR plural images in the spatial domain, and retain only the intensity of LR plural images corresponding to different incident angles.
Then we simulated three different noise types: 1. Gaussian noise: we added Gaussian white noise to the ideal data.
2. Poisson noise: the ideal data is corrupted by Poisson-distributed noise at each pixel.
3. Mixed Poisson-Gaussian noise: First, we simulated the Poisson noise data. Then we added Gaussian white noise to the Poisson noise data.
For Gaussian noise and the Gaussian component of the mixed Poisson-Gaussian, the standard deviation is the ratio between actual standard deviation and the maximum of the ideal data. Then we compared GATFP with two state-of-the-art methods, i.e., TPWFP and Newton method. The code of TPWFP could be obtained at http://www.sites.google.com/site/ lihengbian. The code of Newton method is adapted according to the code samples from http://sites.bu.edu/tianlab/open-source/. Similar to TPWFP, an important parameter of truncation criteria is α h . As stated in [20], α h = 25 works well, so we also use the same setting in our algorithm. Another important parameter for all the algorithms is the iteration number. For TPWFP, 300 iterations are enough as proved in [20]. We set 1000 iterations for Newton and our methods.

Simulation results
First, we compare GATFP with the above mentioned two state-of-the-art methods to show their pros and cons. We apply each algorithm on the simulated data with Poisson noise, Gaussian noise and Poisson-Gaussian noise, respectively. The Poisson noise is used to describe the statistics of the incoming photons at each pixel, which is a discrete probability distribution [3]. The Gaussian noise is mostly caused in the capturing chain, such as thermal noise. Thermal noise is associated with the rapid and random motion of electrons within a conductor due to thermal agitation. Because the number of electrons in a conductor is very large, and their random motions are statistically independent, the central limit theorem indicate that thermal noise is Gaussian distributed with zero mean. A more realistic way to model the noise is a mixed Poisson-Gaussian distribution. Note that the standard deviation (std) for the Poisson-Gaussian noise denotes the level of the Gaussian component. The results are shown in Fig. 2. The images in Fig.2 show the reconstruction of different methods under the Gaussian noise (Gaussian component) with standard deviation being 3e − 3. And the ground truth noise variance is used in the synthetic experiment.
From the results, we can see that TPWFP performs well under Poisson noise, which benefits from its accurate Poisson signal model. Instead, Newton method just minimizes the square of the difference between the actual and estimated measurements. Our method can achieve a successful reconstruction using the GAT approximation without affecting the reconstruction quality. For Gaussian noise, GATFP outperforms the other two methods. This is because we also consider the Gaussian noise of the measurement explicitly as Eq. (4). TPWFP could obtain better result than the other competing methods under small Gaussian noise. This benefits from its thresholding constraint. However, for situations with high noise level, TPWFP may not be an effective reconstructed approach. Instead, Newton method estimates the background for each image and subtracts it to produce the corrected intensity image [18]. Consequently, Newton method provides a better result compared with TPWFP under high noise level. For mixed Poisson-Gaussian noise, GATFP obtains better results than the other methods. This is greatly attributed to its Poisson-Gaussian assumption. To conclude, we can see that the type of noise strongly influences the quality of reconstructed image for TPWFP and Newton methods, while GATFP is more robust under different types of noise. This behavior is well explained by the fact that our model can be treated as a generalized version of these types of noise.

Experimental results
In this sub-section, we demonstrate the performance of our method with experimental datasets. The parameters of the real FPM imaging system are the same as those in the simulation. We obtain the estimated noise standard deviation based on Median of Absolute Deviation (MAD) technique. However, the standard deviation for the Gaussian component of mixed Poisson-Gaussian noise might not be exact since the noise contains the Poisson component. So we adjust the standard deviation based on the value obtained by MAD. In the experiment, we employ the USAF target and blood smear as samples, and capture a sequence of 225 images for both samples. The reconstruction results are shown in Figs. 3-4. Note that the pupil function, which is set as a circular shaped low-pass filter, keeps unchanged in the procedure of TPWFP. In addition, we perform the background subtraction step [18] before running Newton method. As shown in Fig. 3, we can see that TPWFP and the Newton method produce fluctuations in the object phase. GATFP achieves superior performance than the other methods and produces results with easily distinguishable blood cells. To validate the proposed approach obviously, we also employed our method on the dataset of a USAF target. The reconstruction results are shown in Fig. 4. From the results we can see that TPWFP suffers noise corruption. If we apply the Newton method to the dataset, though most of the noise is removed due to the background subtraction step, many image details are subtracted as well (see the group 8 element 6). Differently, the proposed GATFP achieves higher reconstruction performance with less noise and more details. In all, GATFP offers a novel way for FPM to reconstruct highly accurate results from noise-deteriorated inputs.

Discussion
In this paper, we develop and test a novel reconstruction method for FPM termed as GATFP. Based on the GAT approximation, GATFP formulates the FPM recovery as a maximum likelihood problem, and presents a solution utilizing the Truncated Wirtinger Flow algorithm. We compare the reconstruction quality of three algorithms under different types of noise. Both simulation and experimental results demonstrate the validity of our method. One extension of GATFP is to handle much more complex noise by modeling data noise as a mixture of Gaussians. The mixture of Gaussian is a universal approximator to distributions and is able to fit a wide range of noises. In addition, we can also introduce the prior information on the object into GATFP to improve the convergence speed and effectiveness of the algorithm.
The limitation of our method is the case that the model is non-convex. Although we utilize some appropriate truncation thresholds to choose a desired search direction, our method might converge to incorrect local minima. In addition, GAT is able to stabilize the noise variance, yet the tails of variance stabilized coefficients distribution are empirically longer than normality as evidenced in [29]. Therefore, incorporating a convex program on GATFP to obtain a solution with minimum cost will be a research emphasis in the near future.