Deep Phase Decoder: Self-calibrating phase microscopy with an untrained deep neural network

Deep neural networks have emerged as effective tools for computational imaging including quantitative phase microscopy of transparent samples. To reconstruct phase from intensity, current approaches rely on supervised learning with training examples; consequently, their performance is sensitive to a match of training and imaging settings. Here we propose a new approach to phase microscopy by using an untrained deep neural network for measurement formation, encapsulating the image prior and imaging physics. Our approach does not require any training data and simultaneously reconstructs the sought phase and pupil-plane aberrations by fitting the weights of the network to the captured images. To demonstrate experimentally, we reconstruct quantitative phase from through-focus images blindly (i.e. no explicit knowledge of the aberrations).


Introduction
Quantitative phase microscopy (QPM) enables label-free imaging of transparent samples such as unstained cells and tissues [1,2], and non-absorbing microelements [3]. QPM can use partially-coherent beams (in lieu of coherent ones [4]) to increase spatial resolution and light throughput with reduced speckle. Examples include through-focus [5,6], interferometric [7,8], and angle-scanning [9,10] microscopes. The common design theme is to image distinct non-linear renderings of phase as intensity from which quantitative phase is numerically recovered. For a given method, the performance and image quality is intrinsically governed by the phase reconstruction step. [11].
Traditionally, the phase reconstruction problem is addressed by solving an inverse problem by minimizing least-squares loss that is based on the physics Measurements raw intensity-only images Figure 1: The deep phase decoder algorithm aims to minimize the Euclidean distance between the measured intensity images and the hypothetical ones generated by our untrained deep network. The optimization problem, which is nonlinear and nonconvex, is stated in terms of the network's weights and is solved iteratively using a gradient-based procedure. Once the weights are optimized, the sought phase image is retrieved as the output of the deep decoder part of the network.
of the problem. The approach is fundamental to phase imaging [12] and has been practically employed in various systems. An immediate advantage is that our prior assumptions on the images can be directly integrated through regularization. An example is to constrain the phase image to admit a sparse representation in the wavelet domain [13]. Such regularizers work well for various phase microscopes and improve the reconstruction quality [8,14]. Another major advantage of the physics-based formulation is the possibility to incorporate algorithmic self-calibration [15,16]. It involves-in alternation with the phase retrieval step-minimizing the least-squares loss over unknown or partially-known system parameters such as pupil aberrations [11]. The concept hence accounts for the model-mismatch in the imaging pipeline. This provides us with great flexibility and allows phase reconstruction from measurements that are not fully characterized. In designing self-calibrating algorithms, the need for regularization (i.e., prior models for phase) is emphasized [17] since one simultaneously decouples the individual contributions of phase and aberrations to the measured images. However, typical regularization techniques are hand-crafted and require manual tuning of parameters even after the model is constructed. More recently, deep neural networks (DNNs), typically trained in an end-toend fashion on large datasets to directly map given intensities back to phase, have been used to obtain efficient phase retrieval algorithms. For phase microscopy, trained DNNs give state-of-the-art performance in holographic [18], lensless [19], ptychographic [20], and through-scattering-media [21,22] configurations, among others [23]. The results have validated the efficiency of properly trained DNNs to solve non-linear inverse problems and shifted the computational paradigm in QPM towards predominantly data-driven frameworks. How-ever, for deep networks to work well, the proximity of training and experiment settings is critical as the performance is susceptible to variations in sample features, instrumentation, and acquisition parameters [23]. Although improved DNN architectures have been proposed [24][25][26][27], training-based approaches fundamentally rely on the reconstructed phase image to come from a distribution that is close to the one of the training images and are thus sensitive to misfits.
In this paper, we propose a new QPM algorithm that is based on a deep network, but requires no ground-truth training data. Our approach is inspired by the idea of employing untrained generative DNNs as prior models for images, a concept that is pioneered by the so-called deep image prior [28]. Specifically, Ulyanvov et al. [28] fitted a noisy image via optimizing over the weights of a randomly initialized, over-parameterized autoencoder (i.e., an autoencoder with more weights than the number of image pixels), and observed that early stopping the regularization yields good denoising performance, an effect theoretically explained in [29]. For denoising, regularization through early stopping is critical, since the network can in principle fit the noisy data perfectly. Subsequently, an under-parameterized (i.e., less parameters than the number of image pixels) image-generating network, named the deep decoder, that does not need early stopping or any other further regularization has been proposed [30]. The framework acts as a concise image model that provides a lower-dimensional description of images, akin to the sparse wavelet representations, and thus regularizes through its architecture alone. Unfortunately, a naive application of the method to the problem at hand would not account for practical issues such as drift and sample-induced aberrations [17], which points to the need of properly incorporating our knowledge about optical physics to achieve self-calibration.
The key contribution of this paper is a DNN-based self-calibrating reconstruction algorithm for QPM that is training-free and recovers quantitative phase from raw images recorded without the explicit knowledge of aberrations. We specify the entire measurement formation as an untrained DNN whose weights are fitted to the recorded images. Leveraging the well-characterized system physics and non-linear forward model, our network combines a fullyconnected layer that synthesizes aberrations from Zernike polynomials with the deep decoder that is used to generate phase. The proposed algorithm hence describes both the image and aberrations by a few weight coefficients, and as a consequence enables us to jointly retrieve the phase and individual aberration profile of each measurement without requiring any training data. We term our algorithm the deep phase decoder (DPD) and demonstrate it on a commercial widefield microscope.

Methods
Next, we describe the image formation process in our optical setup (Fig. 2), and then describe our reconstruction approach in more detail. We consider an optically-thin and transparent sample that is placed at the focal plane of the microscope's objective. The sample's complex-valued image (i.e., its transmission function) is characterized as where φ represents the spatial distribution of phase over 2D coordinates r. The LEDs are place sufficiently far away that their illumination can be modeled as a monochromatic plane wave at the sample plane. Thus, the irradiance of the beam impinging on the camera is given by where * denotes spatial convolution and c psf is the coherent pointspread function of the microscope. The sensor then measures the sampled irradiance, y ∈ R p , where p is the total number of pixels on the camera. In matrix form, where F is the discrete Fourier transform matrix and P circ is the ideal and space-invariant exit pupil function, which is a circle with its radius determined by numerical aperture (NA) of the objective and wavelength λ.
Phase is recovered based on multiple images with some type of data diversity that translates phase information into intensity (e.g. defocus [5], illumination coding [31], pupil coding [32]). Here we adopt a pupil-coding scheme where the wavefront at the exit pupil [12] is differently aberrated for each measurement. The pupil aberration is modeled as a weighted sum of Zernike polynomials, so it is parameterized by a small number of coefficients: This can be solved by gradient-descent (or an accelerated variation), which is closely related to the well-known Gerchberg-Saxton method [11]. After solving for the complex-valued o , the phase image is its argument. The conventional phase recovery in (5) does not necessarily impose any regularization on the recovered phase and the aberrations must be known a priori. To address these without needing any training data, we introduce a deep network in the derived formulation.
At the core of our approach, we use a DNN that generates N intensity images. The network, denoted by G(W), reparameterizes the measurement formation in (3) in terms of a weight tensor W rather than pixels in compleximage space as in (5). The network is untrained and the weights, which are randomly initialized, are optimized by solving the following problem: where Y = [ √ y 1 . . . √ y N ] ∈ R p×N accommodates all the measured data. Once the optimal weights W are obtained, phase is retrieved as the output of an appropriate layer in the network. The remarkable aspect-in terms of data requirement-is that the process is solely driven by the acquired images and does not involve any training data. The main reason is that the generative network's weights (and hence its output image) are adjusted on-the-fly in (6) rather than training it to be able to represent a certain class of images. We design the network G to encapsulate two sub-generators, G p and G a , that synthesize a phase image and the pupil aberration of each individual measurement, respectively (see Fig. 1). For the phase generating network G p , we use a deep decoder [30], which transforms a randomly chosen and fixed tensor B 0 ∈ R n0×k consisting of k many n 0 -dimensional channels to an n d × 1 dimensional (i.e. gray-scale) image. In transforming the random tensor to a phase image, G p applies i) a pixel-wise linear combination of the channels, ii) upsampling, iii) rectified linear units (ReLUs), and iv) channel normalization. Specifically, the update at the (i + 1)-th layer is given by Here W p i ∈ R k×k contains the coefficients for the linear combination of the channels and the operator U i ∈ R ni+1×ni performs bi-linear upsampling. This is followed by a channel normalization operation, cn(·), which is equivalent to normalizing each channel individually to zero mean and unit variance, plus a bias term. A phase image, which is the output of the d-layer network, is then formed, with W p d ∈ R k , as The aberration-generating network, G a (W a ), relies on the parameterization in (4) represented as a fully-connected layer (i.e. linear combination of Zernike modes) and the matrix W a contains the Zernike coefficients for all measuruments. In combining the outputs of G p and G a , we reproduce the physical image formation using (1), (3), and (4) in the network's architecture. The framework is implemented in PyTorch, allowing us to solve (6) using gradient-based algorithms thanks to auto-differentation with respect to W = {W p , W a }. Once the optimal weights W are obtained, the reconstructed phase is given by G p (W p ) where W p = {W p 0 , . . . , W p d }. We now explain some implicit aspects of the our method. First, we see from (6) that G(W ) replicates the recorded intensities as closely as possible in the least-squares sense. Therefore, regularization of phase is governed by the generative network's architecture for the images have to lie in its range. Specifically, both G p and G a under-parametrize their corresponding outputs (fewer weights than the number of pixels in generated images), so DPD imposes regularization on phase and aberrations. Moreover, once G is constructed, the strength of regularization is not hand-tuned, as is typically done (such as adjusting the sparsity level for wavelet-based methods). It is also noteworthy that the DPD performs the phase reconstruction from randomly initialized (as G is untrained) aberrations as opposed to other self-calibrating schemes that use theoretical pupils as initialization [11,17].

Results
To experimentally corroborate our method, we choose to use a commercial brightfield microscope (Nikon TE300) with LED illumination (λ = 0.514 µm) [10]. A phase target (Benchmark Technologies) is imaged by a 40×0.65 NA objective lens and intensity images are captured by a PCO.edge 5.5 sCMOS camera that is placed on the front port of the microscope, adding 2× magnification. To realize pupil-coding, we capture a through-focus stack of 8 images that are exponentially spaced (at 0, 1, 2, 4, 8, 16, 32, and 64 µm defocus) [34]. To compare against our method, we reconstruct reference phase images with the accelerated Wirtinger flow algorithm [33] using (5) for 8 and 4 (defocus of 4, 8, 16, and 32 µm) measurements. We then use the same 4 measurements to solve the DPD optimization in (6) using the RMSProp algorithm with 5 × 10 4 iterations. The network is constructed with the following parameters: k = 32, n 0 = 16 × 16, and n d = 512 × 512. Bi-linear upsampling is fixed to a factor of 2, making G p a 6-layer network. We use the first 9 Zernike polynomials after piston for G a . The reconstructions from both methods show good agreement with each other (Fig. 3). Also, DPD jointly recovers defocus-like pupil functions, as expected. This experimentally validates our algorithm's ability to blindly reconstruct a reliable phase image from the measured intensities.

Conclusion
In summary, we derived a new phase imaging algorithm that uses an untrained neural network, and demonstrate it on a phase-from-defocus dataset. Our DPD method, unlike its deep learning counterparts that are supervised, is trainingfree and does not rely on closely-matching training and experiment conditions. Moreover, our method is self-calibrating, allowing us to directly reconstruct high quality phase without a priori knowledge of the system's aberrations.