Illumination coding meets uncertainty learning: toward reliable AI-augmented phase imaging

We develop a new Bayesian convolutional neural network (BNN) based technique for achieving large space-bandwidth product phase imaging that is both scalable and reliable. The scalability of our technique is enabled by a novel coded illumination scheme designed by the physical principles of asymmetric illumination-based phase contrast and synthetic aperture imaging. The system takes highly multiplexed intensity measurements to encode phase and high resolution information across a wide field-of-view (FOV). Recovering the phase from these intensity measurements requires solving a highly ill-posed inverse problem, which we show can be overcome by a deep learning (DL) algorithm. The reliability of our technique is quantitatively assessed by a novel uncertainty learning framework. Differing from existing DL-based reconstruction algorithms whose prediction errors can only be discovered in hindsight, our BNN framework allows uncertainty quantification of the DL predictions. Specifically, we show that the BNN predicted uncertainty maps can be used as surrogates to the true error, which is typically unknown in many real-world applications. Furthermore, we complement the BNN with a statistical data analysis procedure that relate the network outputs to credibility quantification metrics. We apply our technique to both static and dynamic biological samples, and show that the illumination scheme allows achieving 5X resolution enhancement across a 4X FOV using only five multiplexed measurements. In addition, we show that the uncertainty quantification procedure allows evaluating the effects of several common experimental imperfections, including noise, model errors, incomplete training data, and out-of-distribution testing data. Finally, we illustrate the utility of the predicted uncertainty maps as a possible way to identify spatially and temporally rare biological phenomena.


Introduction
The imaging throughput of traditional techniques is fundamentally limited by the intrinsic trade-off between field-of-view (FOV), resolution, and acquisition speed.It is well known that the space-bandwidth product (SBP) of an optical system is invariant under any linear canonical transform [1,2].Further considering super-resolution type techniques that require multiple measurements, the acquisition time is scaled linearly with the expanded bandwidth in a single dimension, and quadratically for 2D isotropic resolution enhancement [3,4].The same scaling law also applies to the scanning-based systems for enlarging the FOV.Accordingly, the 3D trade-space spanned by the FOV, resolution, and acquisition speed can be visualized in Fig. 1(a) with a hyperplane defining the achievable imaging attributes that highlights the linear trade-off among them (for a 1D problem).The imaging techniques of our interest belong to the classical phase retrieval problem.Despite the extra complexity from the intensity-only, nonlinear measurements, the general scaling law for the achievable imaging attributes follow the same trade-space, as studied both theoretically [5] and experimentally [6,7].Our first goal here is to investigate the feasibility of bypassing the classical limit imposed by the linear trade-space by combining non-conventional multiplexed measurement schemes and deep learning (DL).By doing so, our technique will open up an expanded design space that allows a combination of FOV, resolution, and acquisition speed beyond those achievable using conventional phase retrieval techniques [as illustrated in Fig. 1(a)].
Our work is inspired by the recent demonstration of several DL-based phase retrieval techniques [8][9][10][11][12][13][14][15][16][17], which can be categorized into two classes.The first class focuses on solving the phase-retrieval problem alone using a convolutional neural network (CNN); no modification to the measurement procedure is made [8][9][10][11][12][13].As a result, these techniques generally do not improve the imaging throughput.Nevertheless, several benefits of using the CNN-based algorithm have been reported, including its robustness to noise, scattering, and experimental errors [8][9][10][11][12][13].The second class focuses on introducing the physical model into the construction of the CNN.This is done by modeling the image formation process as the initial layers of the CNN [14][15][16][17].As a result, training the CNN jointly optimizes the physical parameters used in the acquisition alongside its computational parameters.However, the effectiveness of this approach relies on the accurate modeling of the image formation process [14], which can be difficult in practice due to the presence of uncalibrated aberrations and other experimental imperfections.
Differing from these two classes, we propose to solve the large-SBP phase retrieval problem using a physics-guided DL approach that consists of two complementary components.The first component is a highly measurement-efficient illumination multiplexing strategy designed by two physical principles.First, we exploit asymmetric illumination to encode the phase information into the intensity measurements based  on the principle of differential phase contrast (DPC) [18].Second, we enhance the resolution following the principles of synthetic aperture [19] and Fourier ptychographic microscopy (FPM) [6] by using oblique illumination to introduce high frequency information that are beyond the objective lens's native passband into the measurements.Most importantly, our method uses only five coded measurements regardless of the final resolution [Fig.1(b)], making our technique highly flexible and scalable for large-SBP phase retrieval problems.As a result, our proposed technique avoids the need to quadratically increase the number of measurements to achieve a higher resolution; a limitation that is imposed by conventional FPM techniques.The reason preventing such multiplexed measurements to be used previously is the severe ill-posedness of the resulting inverse problem [7,[20][21][22].This results in undesirable phase artifacts in the reconstruction from existing multiplexed FPM (mFPM) algorithms.The second component uses DL to overcome the ill-posedness of the inverse problem and complements the new measurement strategy.Specifically, we show that our DL algorithm robustly inverts the physical model and recovers large-SBP phase information from highly multiplexed nonlinear measurements, which would otherwise not possible.
An important feature of our DL technique is its ability to quantitatively assess its reliability.In particular, we aim to address a common criticism on DL that the error of the prediction cannot be easily evaluated unless the ground truth is known.To address this issue, we develop an uncertainty learning (UL) framework based on the Bayesian convolutional neural network (BNN) [23] [Fig.1(c)].We show that the reliability of the BNN prediction can be quantified by two predictive uncertainties, including the model uncertainty and the data uncertainty, akin to the epistemic and aleatoric uncertainty respectively in Bayesian analysis [24].In particular, we show that the model uncertainty allows us to characterize the robustness of our physics-guided DL technique.By training and testing on an ensemble of CNNs, the BNN quantifies the variabilities intrinsic to the model without "cherry-picking" the results [23].In addition, we show that the data uncertainty allows assessing the randomness of the predictions originated from data imperfections [23], including noise, incompleteness in the training data, and the error due to out-of-distribution testing data.
In order to rigorously quantify the reliability of the BNN predictions, an important step is to perform statistical data analysis.We develop a procedure to relate the BNN output to Bayesian statistical metrics, including credibility, credible interval, and reliability diagram.By doing so, our work establishes a comprehensive procedure for evaluating the reliability of our artificial intelligence (AI) augmented phase retrieval technique.
By capturing experimental data on two different computational microscopy platforms, we justify our proposition that our technique is applicable to different experimental setups.First, we demonstrate 5× resolution enhancement on the setup in [25].Next, we demonstrate the scalability of our technique by synthesizing multiplexed measurements on both static and dynamic biological data from [7] and achieve 4× resolution improvement.In addition, the robustness of our technique to common experimental factors are quantified by evaluating the BNN predicted uncertainties, including spatially varying aberrations, illumination misalignment, and phase wrapping artifacts.Mostly importantly, the results show that the selection of the training data indeed affects the confidence of the prediction, whose effect can be quantified by our UL framework.Specifically, we investigate the effect of limited training data due to spatial and temporal constraints and biological sample types.Furthermore, the BNN is shown to be reliable when trained and tested on different sample types and under different experimental configurations.The BNN predicted uncertainties are shown to be indicative to the true error.Finally, a potential utility of our UL framework is explored in a time-series experiment to identify rare biological structures and phenomena.

Multiplexed illumination for large-SBP phase imaging
Our illumination multiplexing scheme combines the physical principles of DPC [18] and FPM [6] to encode high-resolution phase information across a wide FOV using a small number of intensity measurements.DPC is a phase microscopy technique that involves taking intensity measurements using asymmetric illumination [26].Under the first Born approximation, the brightfield intensity measurement is linearly related to the sample's permittivity contrast by a weak phase transfer function [18].The distribution of the transfer function affects the quality of the phase retrieval and can be tuned by adjusting the illumination pattern.Most importantly, the transfer function contains missing frequencies along the axis of asymmetry for a given illumination pattern [18].As a result, illumination patterns contain at least two axes of asymmetries are commonly used to ensure complete Fourier coverage.Several studies on the choice of illumination patterns have been performed based on the linear model [18,27].A CNN-based technique has also been developed to optimize the illumination patterns using a data-driven framework [17].It should be noted that the validity of the DPC model relies on the presence of a strong reference wave as in the brightfield measurements; the model no longer holds for darkfield measurements.Accordingly, the maximum achievable resolution by DPC is limited to 2× the objective NA.
To further extend the resolution by more than 2×, our technique adapts the principle of FPM.In FPM, intensities are measured with asymmetric illumination in both brightfield and darkfield.Next, an iterative algorithm is implemented which simultaneously retrieves phase information as well as implements synthetic aperture.As a result, this method can increase the resolution up to the sum of the illumination and objective NAs [6].A major advantage of FPM is its ability to achieve both a wide-FOV and high resolution, i.e. a large SBP.However, its imaging throughput is limited by the long acquisition time imposed by the large data requirement.Specifically, the original sequential FPM (sFPM) requires taking hundreds of images since it requires scanning through all the controllable illumination angles one by one [6] [Fig.1(b)].The acquisition time can be shorted by illumination multiplexing in mFPM.In [20], a random multiplexing scheme is shown to achieve up to 8× data reduction.A hybrid multiplexing scheme that combines DPC in the brightfield with random multiplexing in the darkfield is shown to provide improved robustness in solving the mFPM phase retrieval problem [7].However, all these FPM schemes are fundamentally limited by the conventional tradeoff, which results in an undesirable quadratic increase in the data requirement as the resolution increases [7].
Here, we develop an artificial intelligence (AI)-augmented illumination multiplexing scheme that uses only five asymmetric illumination [Fig.1(b)].First, we design two brightfield patterns based on the DPC model with in-total two axes of asymmetry (every 90 • ) to provide complete Fourier coverage within the brightfield limit.Next, we design three darkfield patterns with in-total three axes of asymmetry (every 120 • ) to further extend the Fourier coverage set by the sum of the illumination and objective NAs, same as in the FPM.A notable feature of the proposed scheme is that extending the resolution simply requires modifying the illumination scheme to use a larger darkfield pattern, without the need for additional measurements.This means that the data requirement remains the same when the resolution increases -bypassing the limitation imposed by conventional techniques.By doing so, we improve the throughput of the data acquisition process by trading off computational complexity.Specifically, the multiplexed measurements cannot be robustly inverted by existing model-based mFPM algorithms due to the severe ill-posedness of the inverse problem.We show that our proposed BNN-based algorithm overcomes this issue by its nonlinear multilayer structure.

Uncertainty learning framework
Our UL framework is built on the probabilistic view of neural networks [28].The learned neural network differs from training by training, which in turn results in varied predictions.The variability stems from several stochastic processes involved in the training, such as random weight initialization [29], dropout [30], and the stochastic gradient descent type algorithms [31].There are two ways to quantify the variabilities in the neural network, including the Bayesian [23] and frequentist [32] approaches.We outline both approaches, provide the mathematical foundations for the Bayesian analysis, and then quantify uncertainties using both Monte Carlo dropout [33] and Deep Ensembles [32].
The BNN replaces the deterministic network weights with probability distributions over them [as illustrated in Fig. 1 marginalization over all the possible network weights w that were learned from the training data (X, Y) = {x t , y t } T t=1 : where Eq. ( 1) applies the conditional independence between the training and testing data, and can be visualized by the graphical model in Fig.To quantify the data uncertainty, we describe the probability distribution of the kth N -pixel random output of the BNN (given the input x k ) by a multivariate Laplacian distributed likelihood function: where the output pixels (indexed by i) are assumed to be independent, and µ k i and σ k i denote the pixel-wise mean and standard deviation, respectively.It can be shown that the widely used mean absolute error (MAE) corresponds to this Laplacian model with a constant standard deviation assumed for the entire output [23].By incorporating spatially varying standard deviations in our model, our BNN accounts for inhomogeneous noise and shift variant model errors.
At the training stage, learning the network weights are done by minimizing the normalized negative log-likelihood function, i.e. the loss function L(w|x t , y t ), given the training data (x t , y t ): L(w|x t , Y t ) consists of two parts: the first residual term resembles the MAE loss normalized by the pixel-wise standard deviation; the second is the data uncertainty regularization term.Most importantly, one does not need the ground-truth mean (µ t i ) nor the ground-truth standard deviation (σ t i ) for learning the uncertainty -minimizing L(w|x t , y t ) allows learning both using the sample pairs (X, Y) taken from the random process.This is achieved by the structure of this loss function.Specifically, a large residual error |y t i − µ t i | will be regulated by a large standard deviation that in turn increases the log(2σ t i ) term; the optimum can only be reached when the two (*) terms are balanced.Training the BNN not only finds the optimal weights that explains all the data, but also quantifies the individual mismatch between the data and the model as measured by the spread (σ t i ) in the network's output.At the predication stage, the BNN estimates both the mean and the standard deviation given the testing input, as illustrated in Fig. 1(c).
One approach to assess the model uncertainty is to use the dropout network [33].Briefly, with dropout applied before every weight layer, a simple distribution q(w) is learned to provide a variational Bayesian approximation to the posterior p(w|X, Y).
At the prediction stage, the model uncertainty is calculated by Monte Carlo dropout [33].By using Monte Carlo integration over P samples satisfying w (p) ∼ q(w), we can approximate the predictive distribution by a Laplacian mixture model: The variations in the distributions p(y|x * , w (p) ) from the network ensembles are the consequence of the model uncertainty [Fig.3(a) Bottom].
The predicted mean μi of the ith pixel can be estimated by the unbiased minimum mean squared error estimator: (p) i is the predicted mean from the pth network, and E denotes taking the expectation.
To provide a single, holistic measure of the uncertainty of the entire process, we quantify the overall uncertainty σi by computing the pixel-wise variance (Var): where the first equality follows the law of total variance; the second is derived from Eq. ( 5) and the Laplacian mixture model; σ denotes the pixel-wise standard deviation predicted from the pth network ensemble.Eq. (7) shows that the overall data uncertainty σ (D) i is measured by the mean of the predicted variance; the model uncertainty σ (M ) i is quantified by the variance of the predicted mean.The second approach to quantify the uncertainties is Deep Ensembles [32], in which multiple identical networks are trained under the same condition.A sufficient number of trained networks fully capture the variabilities of the model.We train eight networks to quantify the uncertainties.The model uncertainty is quantified by the same procedures in Eqs.(6,7).Some examples of the predicted mean phase map, data uncertainty map and model uncertainty map are shown in Fig. 3(b).The comparisons between the Monte Carlo dropout and Deep Ensembles are provided in the supplementary material.

BNN structure
Our BNN follows the U-Net architecture owing to its versatility in solving image-toimage problems [34].It takes the encoder-decoder structure with skip connections to preserve high-frequency features, as shown in Fig. 4. We made several modifications to perform uncertainty quantification.Mostly importantly, the output of the BNN contains two channels, including the predicted (mean) phase map and the data uncertainty standard deviation map.To achieve high resolution enhancement, we further adapt the generative adversarial network (GAN) [35].We found that this GAN approach is needed to achieve 5× resolution improvement for data from our setup.To achieve 4× resolution improvement, we do not need to use the GAN .The impact of GAN to the reliability of the prediction is analyzed in Sec.3.3.Additional details about the network structure and training procedures are provided in the supplementary material.We have also made our implementation open source along with pre-trained weights and test sample data on our GitHub project page [36].

Data acquisition
Our technique is tested on two LED array based computational microscope setups detailed in [7,25] and five different types of biological samples.First, we collect data on unstained Hela cells prepared with two fixation conditions, including ethanol and formalin on the setup in [25].Depending on the fixation, unique morphologies can be observed in each sample, specifically in the plasma membrane and nuclei regions.All images are captured using a 4×, 0.1 NA objective (Nikon CFI Plan Achromat).Each dataset consists of the multiplexed data (2 brightfield and 3 darkfield images) and the corresponding sFPM data (185 images).Both the multiplexed and sFPM data are captured with the same 0.41 illumination NA, providing 0.51 NA final resolution.Next, we validate our technique on the data from [7].The multiplexed measurements are synthesized by summing the single-LED images.We experimentally validate this procedure on setup [25] and find the numerically synthesized multiplexed intensity closely match with the physically captured measurement since the LEDs are spatially and temporally incoherent.We test our method on both fixed U2OS, MCF10A and dynamic live Hela cell samples.The images were captured with a 4×, 0.2 NA objective (CFI Plan Apo Lambda), and either 0.5 or 0.6 illumination NA, that provide 0.7 -0.8 final NA .Each dataset contains synthesized multiplexed and corresponding sFPM data.More details are provided in the supplementary material.

Training and test data configuration
We design three different training and testing data configurations in order to fully investigate the robustness of the BNN subject to different types of "limited data", including unseen biological sample types, limited FOV, and inaccessible temporal data.
In the first set of experiments, the training data is taken from a single cell type; testing is then performed on several different cell types.In practice, different cells can produce out-of-distribution measurements that are not statistically "similar" to the training set.Differing from the classification networks that are prone to testing errors from unseen object types, our network solves the inverse problem of an imaging model.As such, a properly trained network should be able to perform high-quality phase predictions and is robust to sample variations.We investigate how well the BNN can detect and quantify such abnormalities.In addition, we also study the network's robustness to experimental setup variations.
In the second set of experiments, the training data is taken from a limited FOV region, and the testing data is from the entire FOV.This task is of practical importance because wide-field systems like FPM often suffer from spatially variant aberrations [37] and illumination mis-alignment [38].These variations in the imaging path can change the intensity measurements significantly, such as contrast reversal, even when they are taken from the same sample due to the interference effect.As a result, intensity measurements taken from FOV regions outside of the training region can produce out-of-distribution data due to the limited training FOV.Differing from the model-based FPM approach, our data-driven BNN algorithm does not directly take any calibration information when constructing the network.Instead, the BNN needs to learn the spatially varying imaging model from the measurements and the ground truth phase.We will investigate the reliability of the BNN to these model variations.
In the final set of experiments, the training data is taken from a limited observation time window from a time-series experiment.Dynamical biological processes can result in sample variations that in turn affect the statistics of the intensity measurements that may be inconsistent with the training set.We will assess the BNN's ability to make temporal predictions and quantify the uncertainty induced by the limited temporal data.

Data preprocessing
To obtain the ground truth phase for training, we first perform phase reconstruction using the sFPM algorithm [20].To minimize model mismatch induced errors, we further perform algorithmic angle calibration using the algorithm in [38], and digitally correct for the aberrations using the algorithm in [20].Additional preprocessing is followed to remove the residual phase artifacts, including phase wrapping, slowly varying background, and large dynamic range.First, we perform phase unwrapping using the algorithm in [39].Examples from this procedure are given in the supplementary material.Next, the slowly varying background artifact is removed by a morphological opening based algorithm.Third, we perform phase dynamic range correction that clips the 0.1% pixels having extreme values to be a constant.Finally, the phase is linearly normalized to [0, 1].These processed phase map is then cropped into small patches for training.Still, the unwrapped phase contains residual isolated errors typically around large-phase or complex cellular features.This results in incorrect "phase labels" in the training data that later affects the prediction.The impact of incorrect labels and phase clipping to the uncertainties of the phase predictions are analyzed in details in Sec.3.1.
To facilitate later credibility analysis of the BNN output, we further quantify the noise present in the ground truth phase.Following [7], we measure the standard deviation in the background region and treat it as the intrinsic phase noise.We assume that the same noise level is uniformly distributed also across the sample (e.g.cell) regions.This noise level sets the tightest credible interval our BNN can provide; the detailed analysis is presented in Sec.3.3.
To preprocess the intensity measurements, background removal is first performed based on [20,40], followed by the dynamic range correction as in the ground truth phase preprocessing.Next, the full FOV is divided into small patches, which are resized with cubic interpolation algorithm to match the input image size with the ground truth phase.For training, the matching phase and intensity patches are fed into the BNN.For testing, we apply an additional mean equalization to intensity patches taken from the untrained FOV region to alleviate the out-of-distribution effect.We find this procedure is essential to improve the BNN's generalization.Additional details about the preprocessing are provided in the supplementary material.

Data analysis
We develop data analysis procedures to quantitatively relate the BNN predictions to Bayesian statistical reliability measures.Typical neural networks can only evaluate errors based on the ground truth, which is not possible for many practical problems.Here, we derive a set of metrics that do not require knowing the ground truth.Our analysis is based on the predictive Laplacian mixture model [Eq.( 5)].The probability density of the ith pixel to take the value y is Accordingly, we define the credible interval A i = [µ i − , µ i + ] and its bound .The corresponding credibility p i is the predicted probability that the true mean µ * i falls within A i : where F p (•) is the cumulative distribution function (CDF) of the pth predicted Laplace distribution from the neural network ensembles.Another way to quantify the reliability is to calculate the bound p i given a targeted credibility p and the predictive Laplacian mixture model, which can be computed by using the inverse function g −1 i (•): g −1 (•) does not have an elementary function, which is approximated by the bisection method.
To ensure the predictive metrics in Eqs.(9,10) are indicative, we further characterize how well they are calibrated [41].To quantify this, a standard procedure is to compute the reliability diagram that compares the accuracy, i.e. the empirical probability of the ground truth matching with the predicted value, and the credibility [42].Well-calibrated metrics should predict credibility similar to the accuracythe reliability diagram is diagonal.For the regression problem like ours, we adapt the modified reliability diagram [43] that compares the averaged credibility and the empirical accuracy.To generate a reliability diagram with M probability bins, we define the bin interval ∆p = 1/M and the mth bin P m bounded by p m−1 and p m .The averaged credibility Cred(P m , ), takes the mean over the set of pixels S m having similar credibility within (p m−1 , p m ]: where |S m | measures the total number of pixels within the set.The empirical accuracy Acc(P m , ) is defined as the fraction (empirical probability) of the pixels in set S m in which the ground truth mean µ * i is within the corresponding credible intervals A i : In practice, the ground truth mean is unknown and can only be approximated by the sFPM phase that is "noisy", so Acc(P m , ) is influenced by the quality of sFPM reconstruction.The bin interval sets the sampling interval in the reliability diagram, and also affects the sample size in S m .We use the minimum interval while ensuring sufficient sample size for reliable statistical calculation.Both the averaged credibility and empirical accuracy depend on the credible interval bound .We assess our model using different values in Sec.3.3.

Results
Our results are presented in the following order.First, we show that our technique provides high-resolution phase predictions, and that the uncertainty maps are highly indicative to the true error.In addition, the method is scalable to different sample types, and is applicable to experimental setups with varying final resolution.Second, we present large-SBP phase prediction and show that the uncertainty maps allow quantifying the effect of out-of-distribution data due to limited FOV.Third, we establish the reliability of our technique by performing statistical analysis.Finally, we demonstrate time-series predictions and show that UL can facilitate the discovery of spatially and temporally rare biological features and events.

Scalable illumination coding based DL phase imaging
Our illumination coding scheme is highly scalable to large-SBP applications since it always uses five multiplexed measurement for achieving different resolution.Experiments are performed on five cell types capture from two microscope setups and achieving three different resolutions.Specifically, Fig. 5(i) and (ii) are obtained on the setup in [25] and achieves resolution enhancement from 0.1 NA to 0.51 NA; Fig. 5(iiiv) are from the setup in [7]; (iii) and (iv) enhances resolution from 0.2 NA to 0.8 NA, and (v) from 0.2 NA to 0.7 NA.First, we present results from training individual network for each cell type.Without any hyper-parameter tuning, the same network structure is applicable to different samples captured on different setups.Next, we show that the BNN trained with a single cell type is generalizable to other "unseen" cell types.
Example multiplexed intensity measurements are shown in Fig. 5(b).Our BNN is able to consistently provide high-quality phase predictions, as shown in Fig. 5(c).To evaluate the BNN predicted phase, we first compare it with the phase from sFPM in Fig. 5(a) and compute the pixel-wise absolute error map in Fig. 5(f).Adding the additional uncertainty prediction in the BNN does not degrade the phase predictions as compared to the CNN approach (see supplementary material).To demonstrate the need for using DL method to overcome the ill-posedness of the phase retrieval problem, we compare our results from those from two state-of-the-art model-based algorithms using the same multiplexed measurements.The linear DPC model [18] can only recover phase with limited resolution; the mFPM algorithm [7] results in high-frequency artifacts in the recovered phase (see supplementary material).
Next, we inspect the BNN predicted data uncertainty [Fig.5(d)] and model uncertainty maps [Fig.5(e)].The regions where the BNN potentially makes larger errors are marked with higher uncertainties.We observe that the uncertainty maps generally match well with the corresponding absolute error map.In addition, the predicted uncertainty values are about 1/3 of the absolute error.This is because for Laplace distribution "3σ" closely approximates the credible interval bound with 95% credibility.This demonstrates the utility of the uncertainty maps as a direct measure to the accuracy of the neural network predictions.Further quantitative reliability analysis are discussed in Sec.3.3.In addition, we observe that the data uncertainty is the dominant term in our experiments, which suggests that the incompleteness in the training data is the main source of error in the prediction.Indeed, our training data are only taken from a small region of the FOV, as further discussed in Sec.3.2.The low model uncertainty indicates that the predicted phase (i.e.pixel-wise mean) does not vary much across different neural network ensembles.This suggests that phase predictions based on the multiplexed measurements can be performed consistentlythe stochastic training process does not lead to unstable inference results.Furthermore, the high uncertainty regions consistently correspond to the cellular features with large phase values.We attribute this to two primary sources of error.First, the phase clipping inevitably introduces unwanted saturation artifacts in the ground truth phase.Second, although we correct for phase wrapping artifacts when generating the ground truth, residual errors still exist.Due to these inconsistencies present in the training data associated with the large-phase features, the trained BNN tends to flag such "abnormal" regions in the uncertainty output.
Our BNN is trained to solve an inverse problem.As such, a properly trained network learns to invert the physical model, which is independent to the type of objects used in the training.To justify this proposition, we compare results from the BNN trained from the same cell type and from a different cell type in Fig. 6.In general, the BNN is able to make high-quality phase predictions and is robust to the selection of the sample type.Nevertheless, a slight degradation is observed in the phase predicted from the network trained from a different cell type.This is because different cell types have distinct morphological features that can result in different intensity measurements.If the training data does not fully capture the statistical variations in the measurements, less accurate phase predictions would be produced when the network input contains "out-of-distribution" measurements.Most importantly, the uncertainty map from the BNN can automatically detect such abnormalities in the data.As highlighted in Fig. 6, the uncertainty map remains highly indicative to the true absolute error regardless of the cell types being use for training and testing.Additional results to demonstrate the robustness of our BNN to both sample and setup variations are provided in the supplementary material.

Large-SBP phase prediction and uncertainty quantification
Next, we present large-SBP phase prediction across a wide FOV.Our BNN is trained on small image patches.We perform phase and uncertainty predictions patch-by-patch.The full-FOV predictions in Fig. 7(a-c) are obtained by stitching the patches using the alpha blending algorithm.
The full-FOV model uncertainty [Fig.7(c)] allows critically assessing the robustness of our technique.We observe that the model uncertainty is low across the FOV except for small regions around the boundary.This verifies that the BNN can reliably make high-resolution phase predictions from the multiplexed measurementsthe predicted mean does not vary much across different network ensembles.At the boundary regions, the measurements suffer from severe experimental errors that lead to higher variations in the predicted means.
The effect of the out-of-distribution data due to limited FOV is studied as follows.Our training data is taken from a small central region (0.4×0.4mm 2 from the full 3.5× 4.2mm 2 FOV), as shown in Fig. 7(d).In general, aberration degrades as the field angle increases (i.e. the distance away from the center).In addition, the LED illumination produces greater angle mis-calibration [44] and background non-uniformity as the field angle increases.Both effects imply a greater degree of out-of-distribution as compared to the training data.Importantly, our UL approach allows predicting the potential errors induced by the out-of-distribution data -the data uncertainty map predicts higher standard deviation at the peripheral FOV regions [Fig.7(b)].
Identifying such data incompleteness a posteriori provides important feedback to improve the data pipeline in DL.Intuitively, introducing previously out-of-distribution data to the training can reduce the data uncertainty.In our case, more credible predictions can be made by training on more examples encompassing aberrations and angle miscalibration at other FOV regions, as verified by additional experiments detailed in the supplementary material.

Quantitative reliability analysis
To provide quantitative assessment to our prediction, we first calculate the credibility map from the predicted pixel-wise distribution.Given the bound and the predicted mean µ i (at pixel i), the credibility p i [Eq.( 9)] measures the BNN predicted probability that the true mean falls into the credible interval A i = [µ i − , µ i + ].To properly choose , we consider the intrinsic noise in the sFPM reconstructed phase by measuring the background standard deviation σ background .We take this sFPM noise level as the credible interval bound ( = σ background ) and compute the credibility pixelby-pixel.The credibility map provides a direct quantification of how much one can trust the BNN predicted phase.The credibility maps for the five samples and the credible interval bounds are shown in Fig. 8(b).As expected, less credible regions point to the "abnormal" regions where phase clipping or wrapping artifacts are likely present in the training data.
Alternatively, we evaluate the credible interval bound giving a desired credibility.The bound p i (at pixel i) is computed using Eq.(10).By setting a constant p = 0.95  and use six credible interval bounds ( ).The first two cases [Fig.8(i-ii)] with GAN included both show slightly over-confident predictions, as indicated by the curves below the diagonal.The other three cases [Fig.8(iii-v)] without GAN provide better calibrated predictions since the curves closely follow the diagonal.Besides the difference in the BNN structures, the first two cases have ∼ 3× stronger phase resulting in more phase clipping induced errors, and ∼ 2× higher intrinsic noise in the ground truth.Since the estimated empirical accuracy is also influenced by the quality of ground truth, the lower quality ground truth phase in the first two cases could also contribute to the less calibrated predictions.Methods to improve the calibration of BNN is an active area of research [41] and will be developed in our future work.

Time-series large-SBP phase and credibility prediction
Our technique is also applicable to imaging dynamic samples.Fig. 9 shows time-series predictions made by training the BNN using data only from a single time frame.We train the BNN using the upper 3/4 of the FOV at the 26min frame and perform full-FOV predictions on the rest of time frames.An example FOV phase prediction is shown in Fig. 9(a).The reliability of the temporal predictions is further quantified by calculating the credibility maps over time.An example credibility map is shown in Fig. 9(b).As expected, the BNN is credible across the entire trained FOV region and less credible over the untrained region, matching our previous observations.To quantify the reliability over time, we calculate the averaged credibility over the full FOV, the cell and the background regions [Fig.9(c)].The averaged credibility fluctuates within a small The credibility for the cell regions slowly decays over time, which can be explained by that the temporal dynamics gradually induce more "dissimilar" out-of-distribution data.Our BNN enables quantifying such "temporal decorrelation".
Next, we zoom in on two small regions where cell divisions undergo over time [Fig.9(d-e)].In both cases, the credibility drops when the cells present significant morphological changes during mitosis, and increases back to the "normal" level immediately after the process is over.More examples are shown in the movie in Visualization 1.As cells become more globular during mitosis, the phase values grow significantly and often result in phase wrapping errors in the training phase data.In Fig. 9(e), a cell undergoes apoptosis and presents distinct morphological structures.Similar to our previous observations, the BNN consistently identifies these spatially and temporally rare features by "flagging" them as being less credible.

Conclusion
We have presented a physics-guided AI framework for large-SBP phase imaging.Our technique enables high-resolution phase inference across a wide FOV using only five asymmetric illumination coded intensity measurements.Our results show that this BNN-based technique can effectively learn the underlying physical model.Once trained, the BNN can robustly solve the phase retrieval problem and is generalizable to different samples.Further, we have developed an uncertainty quantification framework that allows critically assessing the reliability of the BNN predictions.Specifically, we have applied our UL approach to evaluate the robustness of our illumination coding and DL phase estimation model.In addition, we have also quantified the effect of common experimental errors using the predicted uncertainties.Furthermore, we have showed that applying the UL enables discovering the incompleteness in the training data and quantifying the associated out-of-distribution testing errors.Finally, the predicted credibility map has shown to be useful in identifying spatially and temporally rare biological phenomena and characterizing the "temporal decorrelation" in dynamic processes.We believe this UL framework is widely applicable to many emerging AI-based scientific and biomedical imaging applications where critical assessment to the DL inference is essential.

Figure 1 :
Figure 1: Overview of our deep learning framework for reliable AI-augmented phase imaging.(a) Our technique opens up an expanded imaging attribute space, bypassing the conventional tradeoff between FOV, resolution, and acquisition speed.(b) It uses five asymmetric illumination coded intensities to encode large-SBP phase information.(c) A BNN is developed to make phase predictions and quantify the uncertainties of the model.

Figure 2 :
Figure2: The graphical model of our UL framework that considers randomness in both the network weights w and the predicted output y.

2 .
The posterior distribution p(w|X, Y) describes all the possible network weights given the training data.The predictive distribution p(y|x * , w) describes all the possible predictions given the network weights w and the testing input x * [Fig.3(a) Top].By modeling p(w|X, Y) and p(y|x * , w), we can evaluate the model and data uncertainties, respectively.
e m b le s o f p ix e l-w is e p re d ic te d d is tr ib u ti o n … Data uncertainty: quantified by the standard deviation (std) of the prediction for a given data Model uncertainty: quantified by the std of the network ensembles (a) Uncertainty quantification input image

Figure 3 :
Figure 3: Overview of our UL framework.(a) The data uncertainty quantifies the effect of incomplete training data and is estimated via an uncertainty regularized loss function.The model uncertainty evaluates the stochasticity of neural network training and is estimated by network ensembles.(b) During testing, the direct output from the BNN consists of an ensemble of mean and standard deviation maps.Through statistical modeling, we obtain the final estimated phase, data and model uncertainty maps.

Figure 4 :
Figure 4: The BNN structure to perform UL.The main network takes the U-Net structure.The input takes the five low-resolution multiplexed intensity images.The output predicts two-channel high-resolution phase and uncertainty maps.

Figure 5 :
Figure 5: High-resolution phase estimation from DL-augmented coded measurements.(a) The ground truth phase obtained from the sFPM.(b) The input to the neural network consists of five low-resolution intensity images, including two brightfield, three darkfield.Our BNN prediction includes (c) phase, (d) data uncertainty, and (e) model uncertainty.(f) The absolute error is calculated between the predicted and the ground truth phase.The uncertainty maps are highly correlated with the error maps, demonstrating the predictive power of our UL framework.Unlike existing FPM techniques, our method requires the same number of measurements when the final resolution increases.Hela cells fixed in (i) ethanol and (ii) formalin are imaged with a 4× 0.1 NA objective and reconstructed with 0.5 NA resolution.(iii) Live Hela and (iv) fixed MCF10A cells are imaged with a 4× 0.2 NA objective and reconstructed with 0.8 NA resolution.(v) fixed U2OS cells are imaged with a 4× 0.2 NA objective and reconstructed with 0.7 NA resolution.

Figure 6 :
Figure 6: BNN predictions under different training and testing data configurations.The network can robustly perform phase retrieval against variations in the sample type.The uncertainty map can reliably detect potential errors in the phase predictions and is consistent with the true error.

Figure 7 :
Figure 7: Large-SBP phase prediction and uncertainty quantification.(a) Full-FOV phase prediction achieving 0.51 NA resolution across a 4× FOV.(b) The data uncertainty map reliably identifies the out-of-distribution data corresponding to the peripheral FOV regions.(c) The model uncertainty is consistently low across the FOV except around the boundary, validating the robustness of our model.(d) The training data is taken only from the central 0.4 × 0.4mm 2 region.Zoom-in of the predicted phase, data uncertainty, and model uncertainty of the region from (e) the central FOV and (f) the outer FOV.

Figure 9 :
Figure 9: Time-series phase and credibility prediction.A representative frame from (a) the full-FOV phase prediction achieving 0.8 NA across 4× FOV and (b) credibility map with a credible interval bound = 0.047 rad.(c) The data is taken from the upper 3/4 of the FOV at the 26min frame.The averaged credibility are calculated over time on the whole FOV (red), the cell region (green), and the background (blue).This allows quantifying the "temporal decorrelation" induced by the temporal dynamics.(d-e) Spatially and/or temporally rare events including cell mitosis and apoptosis, result in out-of-distribution data during prediction are automatically discovered by our BNN.The full time-series prediction is provided in the movie in Visualization 1.