Self-calibrated 3D differential phase contrast microscopy with optimized illumination

: 3D phase imaging recovers an object’s volumetric refractive index from intensity and/or holographic measurements. Partially coherent methods, such as illumination-based differential phase contrast (DPC), are particularly simple to implement in a commercial brightfield microscope. 3D DPC acquires images at multiple focus positions and with different illumination source patterns in order to reconstruct 3D refractive index. Here, we present a practical extension of the 3D DPC method that does not require a precise motion stage for scanning the focus and uses optimized illumination patterns for improved performance. The user scans the focus by hand, using the microscope’s focus knob, and the algorithm self-calibrates the axial position to solve for the 3D refractive index of the sample through a computational inverse problem. We further show that the illumination patterns can be optimized by an end-to-end learning procedure. Combining these two, we demonstrate improved 3D DPC with a commercial microscope whose only hardware modification is LED array illumination.


Introduction
Quantitative phase imaging (QPI) recovers a sample's phase delay, which is directly related to optical path length. Since phase cannot be measured directly, QPI methods use various phase contrast mechanisms to encode phase information into the captured intensity measurements, and then solve for phase indirectly. Traditionally, coherent light is used [1]; however, methods that use partially spatially coherent light [2,3] can have higher spatial resolution [4], less coherence-induced speckle [5], and are often less expensive. Differential phase contrast (DPC) microscopy is a practical QPI method that recovers quantitative phase from multiple images with different illumination source patterns [2,6]. The illumination source diversity can be conveniently achieved with a programmable LED array [2,7,8]; thus, QPI is enabled by a simple and inexpensive modification to a commercial microscope. The LED array microscope is, in general, a powerful platform for computational illumination microscopy, enabling not only QPI, but also multi-contrast [7,9], super-resolution [10,11] and 3D imaging [12].
Since phase is a projected quantity related to both the refractive index (RI) and thickness of the sample, 3D phase imaging amounts to volumetric reconstruction of the sample's RI [4,13]. Interferometric and diffraction tomography techniques [12,[14][15][16][17][18][19][20], as well as 3D Fourier Ptychography [12,21], reconstruct 3D RI from projection measurements captured at different illumination angles with spatially coherent light. 3D DPC [22,23], on the other hand, uses partially coherent illumination (e.g. LED array illumination with many LEDs on) to create strong depth sectioning effects [24] that blur out-of-focus planes [2]. The sample is then scanned axially to image the third (z) dimension. This approach is practical because it gives good signal and does not require well-aligned illumination [18,20,25]; however, it does require an axial motion stage, which increases hardware complexity and cost.
Axial scanning for 3D DPC is usually performed by an automated motion stage which stops at each defocus plane [22]. This "stop-and-stare" strategy limits the overall speed of capture [26] because the camera must wait for the motion stage to move and settle before capturing each frame. Fast focusing mechanisms like focus-tunable lenses [23] can improve capture speed, but are expensive to implement. For high-NA systems, motion stages are particularly expensive since the depth-of-field (DoF) is short, so high-precision axial motion is required [23].
Here, we present an extension of 3D DPC that enables fast axial scanning without a dedicated motion stage. Instead, we hand-turn the focus knob of the microscope to scan through focus while capturing video measurements, and then use an algorithmic self-calibration procedure to solve for the defocus positions in post-processing. The system capture speed is increased because the images are taken during the scanning motion, and the overall cost of the system decreases significantly without the need for an axial motion stage.
We further improve the practicality of 3D DPC by optimizing the illumination source patterns. Previous work [6,22] used four half-circular illumination patterns at each focal plane, though later works have shown the ability to reduce the number of images captured [23,27]. The half-circle designs were developed heuristically for use with analytical inversion methods. However, since arbitrary patterns can be used in our system, we aim to optimize for illumination patterns that best encode the 3D phase information in the raw images. To enable a systematic design of the LED patterns, Hugonnet et al. [28] defined an objective function to evaluate illumination patterns. However, the optimized design often becomes very sensitive to the choice of objective function (e.g., to balance high vs. low frequency, sensitivity vs. signal-to-noise ratio (SNR)). Recently, a class of data-driven methods called physics-based learning have been used to optimize the illumination design end-to-end to improve the final reconstruction without a crafted objective function [29,30]. Here, we employ physics-based learning to optimize the illumination patterns for our 3D DPC setup, ensuring efficient and robust capture strategies. Combining this with our self-calibrated axial motion, we demonstrate practical high-quality refractive index reconstruction on a commercial microscope with LED array illumination.

Background: 3D differential phase contrast imaging
An inhomogeneous 3D volume can be written as a scattering potential V (r) = k 2 0 (︁ n 2 0 − n 2 (r) )︁ , where r denotes the 3D spatial coordinate, k 0 is the wave number, n 0 is the RI of the surrounding medium, and n (r) is the complex RI of the sample (real part for phase and imaginary part for absorption). When coherent light propagates through the 3D volume, we can write the evolution of the electric field using the Lippmann-Schwinger equation: where U in , U scat are the incident and scattered light and G is the 3D Green's function [4]. For a partially coherent source, we calculate the intensity distribution at the sensor by treating the source as a collection of different spatially-coherent sources and summing the intensity generated by each after coherent propagation: where S represents the 2D angular distribution of the incoherent source (assuming Kohler geometry) and u ′ describes the spatial frequency of each spatially coherent source (e.g. each LED). The 3D spatial coordinates, r, is dropped in future expressions for simplicity. Previous work [22] simplified Eq. (2) by taking the first Born approximation [4] in order to obtain a linear model directly relating the intensity measurements to the scattering potential. The Born approximation assumes U ≈ U in in the integral term of Eq. (1) and is valid for weakly-scattering objects where U in ≫ U scat . The weak object approximation applies when the auto-correlation of U scat is negligible due to weak scattering, |U scat | 2 ≈ 0 [31]. If we separate the scattering potential into its real and imaginary parts,Ṽ =Ṽ Re + i ·Ṽ Im , the background subtracted 3D image stack under the ith illumination pattern, I ′ i , can be written as where· denotes Fourier transform and H (i) Re and H (i) Im are the real and imaginary parts of the transfer functions corresponding to phase and absorption contrast, respectively, for the ith pattern S i [22]: where F z denotes Fourier transform along the z axis, and ⋆ denotes cross-correlation. S ′ i is the flipped source distribution of S i , P z is the pupil function with defocus kernel, and Γ (u) = 1 4π √ n 2 o λ −2 − |u | 2 for lateral spatial frequency u. Given defocus image stacks for each of M different source patterns, the scattering potentials can be found by solving the following inverse problem: where R (·) is the regularization term. When Tikhonov regularization (L2 norm of V Re and V Im ) is used, we can find an analytical estimator for the scattering potential: Thus, the 3D absorption and refractive index distributions can be recovered from the raw data.

Axial scanning and defocus self-calibration
In previous work, axial through-focus scanning was performed by a motion z-stage with closedloop control. With the "stop-and-stare" strategy, the user moves the stage to each desired focal plane and acquires images with known defocus positions. Here, we instead hand-turn the built-in focus knob on a standard microscope while continuously updating the illumination patterns and capturing images at a fast enough frame rate such that there is negligible motion blur in each frame. This enables fast axial scanning without hardware dependency; however, the user can no longer specify the focal planes and the defocus position of each image is unknown. We seek an algorithmic way to infer the unknown defocus positions in post-processing, known as self-calibration. With partially coherent illumination (many LEDs turned on at once), the system will have strong optical sectioning [24], meaning that images taken at a particular focal plane will have little information from other focal planes. Thus, the problem of jointly solving for the defocus positions and the 3D sample becomes very ill-posed. If spatially coherent illumination (a single LED) is used instead, there is no optical sectioning and changing focus of the microscope only changes the defocus kernel. This makes 3D reconstruction difficult but gives good information to solve for defocus positions. Hence, we choose to use both partially coherent illumination patterns designed for DPC and a self-calibration pattern with only a single LED on. While the focus knob is swept by hand, the LED array quickly alternates between these patterns, with each lasting for one exposure.
Our capture strategy thus collects images for the single-LED illumination at different defocus positions, which are used for self-calibration. The LED array is placed sufficiently far from the sample such that an LED illuminates the sample with a plane wave (spatially coherent light), defined as exp (i2πu ′ · x), where u ′ corresponds to the plane wave angle. Because the light is spatially coherent, we can model the single-LED self-calibration images as where o is the 2D complex-field, and u denotes the lateral spatial frequency. P z is the pupil function with a defocus distance z, modeled by angular spectrum propagation [4], . This single-LED illumination can be from any angle; we choose an off-axis LED because the image will shift laterally with defocus, providing stronger defocus contrast.

Joint optimization for self-calibration
Once the dataset (with N images) is captured, we then need to jointly solve for the field and the defocus positions. The problem can be written in a joint optimization form, This formulation takes N intensity images, I calib (z i ), to solve for one 2D complex-field, o, and defocus positions, z i for i = 1, . . . , N, and thus is well-constrained even with only a few defocus planes [32]. The optimization problem is, however, non-convex, and the defocus positions z i and complex-field o are dependent on each other. As a result, when we use gradient descent-based methods to optimize them, their gradients will be affected by one another, making it difficult to reach convergence. However, if the defocus positions are known, the field can be solved for; similarly, if the field is known, the defocus position can also be determined for each image. For a non-divergence solution, we alternate the optimization for these two unknowns, such that only one variable is updated at each time [33,34].
To perform the optimization, we first initialize the defocus positions with a guess of the total range of defocus and equal spacing between images. Then, we use gradient descent, implemented with Adam [35] for fast convergence, to optimize the complex-field in Eq. (8) with z fixed. Next, we fix the complex-field at the current estimate and update the defocus positions, z. We check the loss after each iteration and stop the update if it converges earlier. These alternating updates continue until the convergence of both variables. To ensure a unique solution of defocus positions, a non-decreasing condition is assumed, such that the defocus motion is only in one direction. This condition is enforced by projecting the updated defocus positions into a non-decreasing sequence.
After joint optimization, we know the defocus position for each single-LED image. Then the defocus positions for images using other illumination patterns are bi-linearly interpolated by the nearest known defocus positions from single-LED images. We also note that, since image planes are no longer equally spaced, during the 3D reconstruction as in Eq. (3), a non-uniform discrete Fourier transform needs to be computed in the z direction.

Physics-based learning for illumination pattern optimization
The choice of illumination patterns will largely dictate the quality of the results. As described above, we time-interleave a single-LED illumination with multiple DPC patterns. Instead of using the traditional half-circle DPC patterns, we use new techniques in physics-based learning [29] to design better partially coherent DPC illumination patterns. The forward model of image formation for 3D DPC 'encodes' the sample's scattering potential into 2D intensity measurements, and the reconstruction 'decodes' the 3D information from these measurements. As described in Section 2., the encoding and decoding processes are described by the system's transfer functions, which specify what information can be encoded into the measurements as well as how much of the encoded information can be recovered (without being overwhelmed by noise, etc.).
Since both the forward model and the reconstruction are differentiable in simulation, physicsbased learning can optimize the patterns by forming the encoding-decoding pipeline, as in Fig. 1(b), then defining a loss function to measure the discrepancy between the the true scattering potential of a simulated sample and its reconstruction. The simulated samples, which the optimized illumination patterns will be tailored to, are expected to have a spatial frequency distribution similar to the experimental samples. The illumination patterns are updated iteratively to minimize the loss function. Our loss function consists of an object consistency loss and a source physical constraint term: where S denotes a set of illumination patterns. The real and imaginary components of the transfer function and scattering potential are written together for conciseness. The object consistency loss measures the L2-distance between the reconstruction and the true scattering potential, V, where fwd and rec are the forward model and reconstruction as described in Eq. (3) and Eq. (5), respectively. The source physical constraint term, c, enforces non-negative light intensity and limits the maximum intensity of each LED to one: This term will give a reverse gradient when S goes below 0 or above 1. The overall weight of this term is set to be large, so that its gradient also prevails over the gradient of object consistency when the light intensity goes beyond the range. We find this term effective to eliminate trivial solutions with very large or negative values for light sources, and the optimized patterns do not require a normalization or clipping at the end.

Practical considerations
We discuss a few practical considerations that we include in the optimization. First, it is important to simulate the noise in the forward model to discourage solutions that bring good phase contrast but sacrifice the overall SNR. Hence, we model the readout noise and the signal noise in the forward model. The readout noise is from a Gaussian distribution by its nature, and the signal noise is from a Poisson distribution, which is also approximated as Gaussian for the relatively high light levels in our system. During physics-based learning, both the readout and the signal noise are sampled from standard normal distributions and scaled with the total illumination intensity, ∑︁ S, before being added to the simulated intensity images at the end of the forward model. The total noise, I ′ noise , added to the normalized intensity images can be written as where t exp is the exposure time, and α, β are coefficients we obtained from experimentallycaptured images. Since the additive noise carries no information about the sample, it will only negatively affect the reconstruction and raise the loss value. In this way, the physics-based learning optimization will have an incentive to use higher total illumination power. Second, binary-valued illumination patterns (i.e., each LED is either on or off) are easier to implement in practice because: 1) the hardware delay time of binary-valued pattern updates are shorter, and 2) binary-valued patterns do not require per-LED illumination intensity calibration. However, this binary-value constraint requires a combinatorial optimization, which is difficult to solve in practice. Instead, we still use gradient descent to optimize for continuous-valued patterns while promoting a binarized LED intensity value distribution (values to be close to either 0 (off) or 1 (on)) as follows. At each iteration, we feed in the binarized patterns to the forward model while keeping the continuous-valued patterns for the reconstruction. Since only the continuous-valued patterns are updated during the gradient descent optimization, the final optimized patterns will have more intensity values close to 0 or 1 to minimize the mismatch due to the binarized patterns in forward model. After the optimization, the binary-valued patterns can be obtained by thresholding the continuous-valued optimized patterns.
Third, we take into account the slight misalignment of the LED array to improve the robustness of the optimized patterns. We randomly add a small lateral shift to the source patterns during the forward model while assuming the original, not-shifted patterns in the reconstruction. This mismatch will deteriorate the reconstruction for illumination patterns sensitive to misalignment, and we find that the optimized patterns with this consideration are denser and more connected.

Results
Experiments were performed on a commercial inverted microscope (Nikon TE2000-U) with a 40× 0.65NA objective lens (Nikon). A customized LED quasi-dome array (SCI Microscopy) [36] was installed on the microscope to replace the conventional illumination unit. The top panel of the LED array was positioned 65mm above the focal plane and only the 'brightfield' LEDs were used (those that illuminate the sample from an angle within the NA of the objective lens). Green LEDs (λ centered at 525nm) were used throughout the experiment. We used a sCMOS sensor (PCO Edge 5.5 monochromatic) to capture intensity images in global shutter mode. Each exposure was hardware triggered by the LED array's controller (Tenseey 3.2) after every illumination pattern update. We used an automated piezoelectric z-stage (Thorlabs MZS500-E) to defocus the sample for the comparison case of controlled axial scanning. For hand-turning defocus, the fine focus knob of the microscope was spun, and an off-axis LED at NA x = 0.36, NA y = 0.0 was used as the self-calibration single-LED pattern.
In the rest of this section, we first validate our defocus self-calibration. Then, we detail the physics-based learning optimization setup in simulation and validate with controlled axial scanning. In the end, we combine together the self-calibration and optimized patterns as our final stage-free 3D DPC system and show experimental results.

Defocus self-calibration validation
We first validate the self-calibration algorithm on experimental data from single-LED illumination. We use the precision z-stage to defocus and acquire an evenly-spaced image stack with 140 planes with a step size of 1µm. This image stack is acquired with the same optics and exposure setting as in the rest of study. Then, we randomly choose a number of images with defocus positions in a monotonic order, and those chosen images become an unevenly-spaced image stack with known defocus positions, which can be used to validate our self-calibration algorithm. We repeat this process to randomly generate 20 image stacks for each of 10 different average defocus spacing between images, from 1µm to 10µm. The self-calibration algorithm, blind to the knowledge of their ground truth positions, is performed on each image stack as follows. The self-calibration is initialized with a linear defocus estimation. Then, the joint updates are performed for 100 iterations, each of which contains 50 gradient descent steps to update the field and 10 steps to update the defocus positions.
The error of the self-calibration is quantified by comparing with the ground truth defocus positions and results are plotted in Fig. 2. With average defocus spacing of 1µm, the defocus position error was 91nm, which is close to the resolution of the z-stage (50nm). When the average defocus spacing is 5µm, the defocus position error is 0.29µm while the error of linear guessing jumps to 1.82µm.

DPC Illumination pattern optimization
To determine the optimal number of DPC illumination patterns, we compare the final object consistency losses for the optimizations with different numbers of patterns. When an additional pattern does not further reduce the loss, it is considered unnecessary. Figure 3 shows the relationship between the object consistency loss and the number of DPC patterns. When noise is incorporated into the simulation (Fig. 3(b)), having more patterns in a fixed total acquisition time reduces the SNR. We find 2-4 DPC patterns gives the minimal loss value, and thus we choose to have four illumination patterns, as in the case of half-circular DPC illumination patterns.  Next, we use end-to-end optimization to find the patterns for our four DPC illuminations. The patterns are randomly initialized from a uniform distribution and then optimized by an Adam optimizer [35] for 250 iterations. The optimization is implemented in Tensorflow (Google), and we use a single GPU (Nvidia Titan X) to accelerate the computing. The optimized patterns are shown in the first two rows of Fig. 4(c). The typical DPC half-circle patterns (Fig. 4(a)) and the optimized patterns without applying the practical considerations in Sec. 4.1 (Fig. 4(b)) are shown for comparison. The patterns without the practical considerations ( Fig. 4(b)) have more emphasis on high-angle illumination; the patterns optimized with the practical considerations ( Fig. 4(c)) have dense and connected patterns to cover more low-angle LEDs, which presumably give a better SNR and are more robust for misalignment.
Throughout the pattern optimization, we use ground truth simulated objects that are somewhat similar to the types of samples we expect in experimental applications. We simulate spherical objects with smaller high-RI spheres inside to mimic simple cells. A small Gaussian noise with mean 0 and standard deviation 0.25 (which is about 3% of the real part of the maximum scattering potential value) is also added to each object's scattering potential to increase the spatial frequency diversity of simulated objects and to avoid over-fitting to particular spatial frequencies.

RI reconstruction with optimized DPC patterns
The optimized DPC patterns were programmed on the experimental system for validation. We imaged 10µm polystyrene-based microsphere beads (Sigma-Aldrich) with RI 1.6 [37] in RI 1.584 index-matching oil (Cargille; RI 1.592 at λ = 525nm). Some microspheres were greater than 10µm in diameter, possibly due to their reaction to the index-matching oil or degradation during storage. We acquired an image stack for each illumination pattern, defocused by the precision z-stage with 1µm spacing between planes.
As a comparison, the RI reconstructions for the half-circular DPC patterns used in [22], patterns optimized without practical considerations, and patterns optimized with practical considerations are shown in the third row of Fig. 4 respectively. The patterns optimized with practical considerations (Fig. 4(c)), give more accurate RI values and reduce elongation artifacts [22] in the axial direction compared with the half-circular baseline ( Fig. 4(a) and patterns optimized without the practical considerations ( Fig. 4(b)). Therefore, the practical considerations helps to improve experimental robustness and the patterns optimized with the practical considerations (Fig. 4(c)) are used as our final optimized DPC patterns. A 3D reconstruction of human embryonic stem cells using the final optimized DPC patterns is shown in Fig. 5.
To further investigate the optimized DPC patterns, we visualize the 3D phase transfer functions (H Re in Eq. (3)) corresponding to the patterns (Fig. 6). Note that zeros in the transfer function indicate that no phase information is encoded in the intensity measurements for that spatial frequency. Therefore, a good set of illumination patterns should have as many non-zero regions as possible to recover 3D information. Dotted circles in 2D slices indicate the region of missing cones (also illustrated in 3D in Fig. 6(c)), in which the phase information cannot be encoded due to the limited NA [31,38]. Figure 6(b) shows the 3D transfer function for a traditional DPC half-circular pattern, for comparison. The transfer function of DPC pattern has good coverage at the zero axial frequency plane, but misses the low-frequency content (indicated by the arrows in Fig. 6(b)) outside of the missing cones at non-zero axial frequencies. This explains the RI underestimation of the DPC patterns in Fig. 4(a); the transfer functions for the optimized patterns have good coverage of low-frequency content across different axial frequencies and thus recover a more accurate quantitative RI value.

Experimental validation with self-calibration + optimized patterns
We combined hand-turning axial scanning with the optimized illumination patterns into our final 3D DPC system. We imaged 10µm borosilicate glass (RI 1.56) microspheres (Duke Standards, Thermo Fisher) in RI 1.54 index-matching oil (Cargille; RI 1.546 at λ = 525nm). The illumination sequence consisted of the 4 optimized patterns in Fig. 4(c) and one single-LED illumination for defocus self-calibration. During the acquisition, we continuously turned the fine focus knob by hand at an approximately steady speed, while the LED array looped through the illumination patterns and sent triggers to the camera after each update. We acquired 375 frames (75 frames for each illumination) in about 25 seconds, and the total defocus was about one rotation of the fine focus knob, roughly 100µm of defocus. With single-LED illumination measurements, we performed defocus self-calibration and recovered the defocus positions shown in Fig. 7(a). The self-calibration optimization took about 7 minutes to finish 100 iterations using a single GPU (Nvidia Titan X). The self-calibration update reached its convergence after around 50 iterations (see Fig. 7(b)). The reconstructed 3D RI volume is visualized in Fig. 7(c), with zoom-in lateral and axial sections for the two insets. Many detailed features within microspheres (presumably due to the fabrication imperfection) can be observed with clear contrast in both insets, showing the efficacy of the optimized patterns. The quantitative RI values also match well to the labeled value of the glass microspheres (RI 1.56), with the exception of a negative relative RI and a zero relative RI microsphere in inset 2 of Fig. 7(c). We believe these two microspheres have different RI inherently after checking the measured image contrast. The halo artifact for phase imaging described in [39,40] can also be observed for our 3D reconstruction on the in-focus plane of an object; a non-negativity constraint can be used during the Tikhonov reconstruction to suppress the halo artifact as described in [22].

Conclusion
We demonstrated an extension of 3D differential phase contrast (DPC) imaging with improved reconstruction and without a motion z-stage. We introduced a practical stage-free axial scanning by spinning the built-in focus knob while taking measurements and then self-calibrating for the actual defocus positions later using a joint update algorithm. We also showed the illumination patterns can be optimized for a better refractive index reconstruction.
Future work could focus on tailoring the loss function for the optimization based on the application. For example, low-frequency contrast will be more important if we are interested in segmenting densely placed cells. It is also worthwhile to investigate the role of different regularization of the reconstruction. We used Tikhonov regularizer in this paper for its generality, while other regularizers, such as total variation (TV) and deep learning-related ones [41,42], might bring other insights in particular imaging scenarios.