No-search focus prediction at the single cell level in digital holographic imaging with deep convolutional neural network

: Digital propagation of an oﬀ-axis hologram can provide the quantitative phase-contrast image if the exact distance between the sensor plane (such as CCD) and the reconstruction plane is correctly provided. In this paper, we present a deep-learning convolutional neural network with a regression layer as the top layer to estimate the best reconstruction distance. The experimental results obtained using microsphere beads and red blood cells show that the proposed method can accurately predict the propagation distance from a ﬁltered hologram. The result is compared with the conventional automatic focus-evaluation function. Additionally, our approach can be utilized at the single-cell level, which is useful for cell-to-cell depth measurement and cell adherent studies. by a conventional automated focus-evaluation The auto-focus function utilizes 2D standard deviation of reconstructed amplitude ﬁnd the perfect propagation plane. The experimental results and comparison with the focus-evaluation function that the proposed method can estimate a ﬁltered hologram. We experimentally signiﬁcantly the numerical reconstruction time to focus. The automated focus-evaluation function requires digital propagation at diﬀerent distances


Introduction
The first suggested holography scheme is in-line or Gabor holograms [1]. It is particularly useful in conjunction with a reconstruction algorithm for particle image analysis, 3-D tracking, or swimming cells in a liquid flow [2][3][4][5] in digital holographic applications. The main drawback of the in-line configuration is the zero-order noise. Also, real images and twin images are superposed and cannot be separated easily. Phase shifting methods can be helpful to separate zero-order noise, real images, and twin images in an in-line setup [6][7][8][9][10]. This requires capturing multiple holograms corresponding to various phase differences between the object and reference light. The main drawback of the phase shifting method is that during the hologram recording, the sample needs to be immobile, which is impossible when studying moving, dynamic objects such as biological cells, and tracking micro-organisms. Several other digital holography schemes also have been proposed for different applications and studies [11][12][13][14].
The off-axis configuration (see Fig. 1) was shown to be very useful in different studies of biological samples [15][16][17][18][19][20][21][22] and dynamic topography [23,24]. Off-axis configuration for studying biological samples is called digital holographic microscopy (DHM) [17]. DHM enables nondestructive investigations of biological samples as well as marker-free and time-resolved analysis of cell biological processes. More specifically, the interpretation of a quantitative phase-contrast image with DHM (QP-DHM) gives access to quantitative measurements of both cellular morphology and the content of a sample with only one shot. It is a real-time method, and in the absence of mechanical focus adjustment, it can be used for time-lapse studies of biological samples. To obtain the phase-contrast image for the phase objects and an amplitude-contrast image, the determination of the propagation distance is particularly important.
The distance between the hologram plane (CCD plane) and the observation plane (the reconstruction plane) is defined by the reconstruction distance "d". In digital holographic reconstruction, an in-focus reconstructed image occurs when the reconstruction distance is equal to the distance between the CCD and the image during the hologram recording ( Fig.  1(b)). Several automated approaches have been proposed to determine the best focus plane for amplitude-contrast or phase-contrast images in related applications [25][26][27][28][29][30]. Generally, multiple images at different focus planes are numerically reconstructed, and a function evaluates the quality of each image. The focus-evaluation-function is a measure of whether the image is focused or not by assessing the sharpness of either the amplitude-contrast or phase-contrast image. For example, pure phase objects appear with minimum visibility in amplitude-contrast images when in-focus reconstruction is achieved [27]. All these methods require several propagations at different distances, which is a time-consuming task since the propagation requires applying a Fourier transform. The interval of the propagation distance for multiple reconstructions can be roughly estimated to reduce the unnecessary reconstructions, but it is still not a single reconstruction approach (the order of reconstructed images is reduced from a few thousand to a few hundred images). It becomes more apparent when time-lapse imaging is considered. Long-term biological studies require permanent focus readjustment to maintain optimum image quality.
We propose using a deep learning convolutional neural network (CNN) for the estimation of the propagation distance "d". Previous methods required multiple numerical reconstructions at different distances and a function to evaluate the sharpness of amplitude-contrast images or phase-contrast images. Our model can estimate the reconstruction distance (RD) or "d" from the recorded hologram for micro-size objects without focus-evaluation function. This method has two main advantages. First and foremost, it is significantly fast since it does not require multiple digital propagations and a function to evaluate sharpness at each distance. Secondly, this method can provide a specific "d" regarding the position of cells with respect to other cells. This is very useful for the analysis of cell adherence on surfaces. The focused image for the training data is obtained by the focus-evaluation function, which utilizes standard deviation (STD) of amplitude image.

Off-axis DHM
A hologram is the interference between object wave O and reference wave R. Two waves are obtained by inserting a beam splitter in the direction of laser propagation, which is similar to a Mach-Zender interferometer (see Fig. 1(a)). The object beam illuminates the specimen, and a microscope objective (MO) collects and magnifies the object wavefront. The hologram plane (the CCD plane) is located between the MO and the image plane at a distance "d" from the image (see Fig. 1(b)). Spatial interference is recorded by a CCD camera and transmitted to a personal computer for numerical reconstruction [31][32][33]. The small tilt angle between O and R enables eliminating the parasitic orders and isolate the real image from twin images and zero-order noise (due to the off-axis geometry). A spatial filter with properly defined size to cover only the bandwidth of the real image can keep real image. To reconstruct the signal, the filtered hologram is multiplied by the digital reference wave and reconstructed by the Fresnel approximation. Eventually, the amplitude image and the phase image (see Fig. 2) respectively can be obtained by: where Ψ(m,n) is the results of Fresnel propagation, and n, m are integers and specify number of pixels in the reconstruction plane. Throughout this manuscript, the phase value is sometimes represented as the optical path length difference (OPD) between the reference wave and the wave that passes through the sample: where n s (x, y) denotes the integral refractive index of the sample at pixel (x, y) along the optical axis, n m is the refractive index of the sample's surrounding medium, and t(x, y) is the thickness of the sample at the (x, y) th pixel. The distance between the hologram plane and the observation plane (the reconstruction plane) is defined by the reconstruction distance "d". In digital holographic reconstruction, an in-focus reconstructed image occurs when the reconstruction distance is equal to the distance between the CCD and the image plane during the hologram recording. An out-of-focus image appears if the reconstruction distance is not precise, as shown in Fig. 2.

Hologram recording and sample preparation
A commercially available DH T-1001 from LynceeTec SA (Lausanne, Switzerland) was used to record the holograms. In this setup, the microscope magnification factor and field of view were set as 40×/0.75NA and 140 µm, respectively. A red laser source with a wavelength of 666 nm was used. The CCD sensor size was 11.3 mm × 7.1 mm, and the pixel size (H×V) was 5.86 µm × 5.86 µm. The CCD resolution was 1920×1200 pixels (the hologram size was cropped to 1024×1024 pixels for efficient FFT computations). The transverse resolution was around 200 nm, which is in good agreement with the classical Abbe criterion (0.61λ×NA). The phase stability of the system was around ∆φ = 0.5°. The reconstruction process of the quantitative phase images was conducted off-line using MATLAB2018 and a standard PC at a rate of several images per second.
To prepare RBC sample, a few microliters of blood were collected from a healthy lab donor. It was diluted at a ratio of 1:15 (v/v) in cold HEPA buffer (280 mOsm, 15 mM HEPES pH 7.4, 130 mM NaCl, 5.4 mM KCl, 10 mM glucose, 1 mM CaCl 2 , 0.5 mM MgCl 2 and 1 mg/ ml bovine serum albumin) at 0.2% hematocrit. Blood cells were sedimented by centrifuging at 200 g and room temperature for 5 min, and then the buffy coat was gently collected. The erythrocytes were diluted in 600 µl of HEPA buffer, and finally, ∼20 µl of the final erythrocyte suspension was introduced into an imaging slide consisting of two coverslips.
The R-CNN model was trained for sulfate latex beads (4% w/v, 9 µm, refractive index 1.591 at 590-nm wavelength; ThermoFisher; Catalog No: A37307). The beads were mixed with cold distilled water at 50% v/v, which can provide a less concentrated sample. Next, ∼20 µl of the suspended sample was introduced into the imaging slide similarly to the RBC sample. ; bandwidth of real image, twin image, and zero order noise are separate due to the off-axis geometry. (d, e) Amplitude and phase of the numerically reconstructed signal when the reconstruction distance "d" is too short, (f, g) when the reconstruction distance "d" is correct, and (h, i) when reconstruction distance "d" is too long. The corresponding amplitude image for the in-focus phase contrast image has the lowest contrast. (j) Cross section of the phase image of RBC in (e), (g), and (h). The phase is converted to thickness by t(

Regression-CNN for reconstruction distance estimation
The deep architecture of the processing units in CNN allows the model to characterize the features of images at different scales [35]. Consequently, CNNs with these features can make accurate predictions thanks to the supervised learning of target values provided during the training process. The CNN has been used in many applications, such as RBC segmentation from holographic images of RBCs using a fully CNN (FCN) [36], classification of hologram images of cells labeled with microbeads without reconstruction algorithms [37], and coronary lumen regression analysis [38]. A CNN was also applied in several optical-related studies, such as depth estimation in in in-line holograms of natural images [39], predication of the focus plane by AlexNet and VGG16 model [40] and non-parametric autofocusing with regression CNN model [41]. Reference [39] utilized only a portion of intensity and spectrum of in-line holograms to shows that CNN with regression can estimate depth. This work is limited since it is a simulation of natural images (only amplitude objects; dogs, tires. etc.). In Ref. [40] a similar approach is proposed but the uses of focus-evaluation-function is missing. Reference [40] also did not address single-cell analysis and the ability of the model to predict focus when cells are located at different depth. Z. Ren [41] only addressed five reconstruction distances for the pure phase objects like biological samples.
Indeed, Phase recovery and eliminating twin image in holographic images [42], extending the depth of field in reconstructed hologram images by deep learning [43], and diffractive deep neural networks for holographic studying [44].
The linear regression layer attempts to find the best fitting line between the feature map and continuous true values for the training. The feature maps are extracted from the amplitude part of filtered holograms (see Fig. 3(a)). The experiments revealed that non-filtered holograms diminish the performance of the proposed estimation (results not shown here). The true values are obtained by the focus-evaluation function, which will be explained in the next sections. The proposed R-CNN architecture consists of two parts (see Fig. 3(b)): the feature extraction part for feature learning and the linear regression to predict the continuous focus distance. The feature extraction part consists of five stages, and each contains convolution layers, batch normalization layers to eliminate the vanishing gradient problem, an activation function, and a pooling layer. The size of the filter for all convolution layers is 5×5, and the rectified linear unit (ReLU) is applied to the output of every batch normalization because it has the advantage of alleviating the vanishing gradient problem. A max-pooling layer is preferred to an average pooling layer due to the better performance [35,45]. The depth of the feature map is doubled after passing through each convolution layer, and using the pooling layer makes it possible to greatly reduce the size of the input and obtain translation invariant features. The fully connected layer for the linear regression contains 150 nodes and is connected to all the units in the previous layer. Because our network has to predict continuous focus values, a linear function (the identity function, y = x) is used as an output function of the output layer. Two models with the same structure were constructed in this study: one for RBCs and one for microsphere beads. The reason for designing two models is that the performance of a combined model was slightly lower than two separate models. Since our method should be able to accurately predict the reconstruction distance, we preferred to put two models. We used the mean squared error metric to minimize the loss between actual and predicted values. To train the network, the Adam optimizer was used to minimize the loss function and to update all the trainable parameters. The Adam optimizer is invariant to rescaling. Thus it is suitable for a non-stationary loss function, and it can be used for automatic learning rate annealing [46].
The initial learning rate of 0.001 was decreased by a decay factor of 0.7 every five epochs, and the momentum was set to 0.9. The training process stopped when the validation loss did not change for ten consecutive epochs. The mini-batch size for training was 128 for the RBCs model and 256 for the bead model. Data augmentation (45-degree clockwise rotation at each mini batch, horizontal and vertical flip) was added to the model to prevent over-fitting problems. All R-CNN simulations were done in Python. R-CNN models were built in Tensor flow (Keras, GPU only (NVIDIA Geforce GTX 690, version 2.2.4). The KERAS model was then imported into MATLAB2018 to replace the focus-evaluation function. The data-set is divided into 80% training and 20% of validation.

Data extraction for training regression CNN model
Several holograms of RBCs and microspheres were recorded to train the two R-CNN models. Figure 4 shows the variation of the reconstruction distance "d" regarding the distance between the sample and MO or "d s ". By varying "d s ", several holograms are generated.
To generate a training set, the distance between the sample and MO ("d s " shown in Fig. 1(b)) was adjusted by a controllable stage with a resolution of 0.1 µm along the optical axis. In total, more than 3000 holograms each containing 8 RBCs and more than 2400 holograms each containing eight microspheres were recorded for training the two R-CNN models (see Fig. 4). The holograms were filtered before feeding them to the R-CNN model, as mentioned before. The filtering can remove the noise and uninformative spatial patterns stored in the hologram, and thus, it can provide elegant spatial patterns for the R-CNN model. Figure 5 shows some examples of holograms at the object scale and the corresponding reconstructed phase objects at the single-object level when the correct "d" is provided by the focus evaluation function (time-lapse images of the hologram changes is shown in Visualization 1 and Visualization 2 for bead and RBC respectively for one single object). The focus image for the training data is obtained by the focus-evaluation function, which uses STD of amplitude image.
The real output for the training or the desired value of the propagation distance is provided by the source code implemented in MATLAB 2018(a). The source code numerically reconstructs the amplitude and phase-contrast images, similar to the method explained in Refs. [31,32]. The best Fig. 4. Two examples of holograms at different distances "ds" recorded for training the R-CNN model, (a,b) for the RBC model and (c,d) for the bead model. The insets show the single hologram extracted for training. Variations of "d s " and the corresponding optimal reconstruction distance "d" for the best reconstruction for the training set are illustrated in (e) for RBC model, and in (f) for bead model. The optimal distance for each hologram containing multiple cells or objects is evaluated by reconstruction of 90 holograms at different "d" values and evaluating the amplitude image's 2D standard deviation as illustrated in (g) for 2D STD output for one RBC hologram, and in (h) for one bead hologram.  5. (a) Several single RBC and the corresponding phase images by numerical reconstruction. Reconstruction distance or "d" is found by the focus-evaluation function. "d" is considered as the desired output for the regression layer of the CNN model. (b) Several bead holograms and the corresponding phase images by numerical reconstruction. Hologram images at the single-object level are fed into the CNN regression model during the training stage. Unit for "d s " distance and "d" distance is 0.1 µm and 1.0 µm, respectively. focus plane is found by performing 90 reconstructions at different focus planes for each hologram containing multiple micro-sized objects. Then, the focus evaluation functions find the best distance plane by applying the two-dimensional standard deviation to the amplitude image (see Fig. 4(g) and Fig. 4(h)). This function evaluates the dispersion of the reconstructed image's pixels (amplitude image) by standard deviation measurements along the x and y directions. Evaluating the amplitude image is a very practical approach for studying transparent or semi-transparent samples (a standard feature for biological samples) since objects are nearly invisible in the amplitude image. Accordingly, the STD of pixels is nearly zero.

Experimental results
The performance of the trained regression CNN (R-CNN) model was examined by recording several new holograms at different "d s " and comparing the R-CNN's estimation with the output of a focus-evaluation function. The hologram test set was never used for the training, and the R-CNN model provides the estimation of propagation distance according to the spatial pattern of the input hologram. Figure 6 shows some examples of the holograms used to evaluate the performance of the proposed model at different distances "d s ". Figure 7 presents the results of the proposed R-CNN model to estimate the reconstruction distance of microsphere beads. Since this method can estimate the reconstruction distance at the single-hologram level, multiple reconstructions regarding each micro-object can be performed. Accordingly, four beads were considered, as shown in Fig. 6. To validate the output of the method, the output of a single hologram was compared with the focus-evaluation function, and a correlation analysis was performed. Alongside the analysis, the 3D profile of the phase image of each bead and the corresponding cross section is shown. Since the beads are adherent and located at the same location along the z-axis, they are almost at the same focus distance from the camera, so the output of the R-CNN is similar for the four input holograms. The correlation between the focus-evaluation function's output and the R-CNN's output is significant (the line equation is y = x, and the offset is negligible).
The performance of the proposed method was also evaluated using red blood cells (RBCs). Unlike microsphere beads, there is significant cell-to-cell variation in the RBCs. The variation is very useful for testing the proposed model's performance in various conditions. Five single RBCs (R1-R5) were chosen to evaluate the performance of the R-CNN model (see Fig. 6). R5 is located at a different focus distance to the other red blood cells. Therefore, conventional focus-evaluation functions are unable to find a focus distance at which the 3D profile of R5 is best resolved. As mentioned, the RBC-to-RBC single-hologram variation (see Figs. 6(d), (e) and (f)) is considerable and can show the performance of the proposed method in various conditions that are different from training. To validate the output of the method, the output of a single hologram was compared with that of the focus-evaluation function, and a correlation analysis was also performed (see Fig. 8). R5 is located at a different focus distance. The proposed method can predict the correct value regarding the difference of the focus for R5 and other cases.
The main advantage of this method is that it can estimate the propagation distance with respect to the hologram of the micro-size sample. This can be very helpful in studying samples where cells are located at different levels along the optical axis. Figure 9 shows that the focus-evaluation function is unable to find a distance at which the profile for R5 is well resolved. In contrast, when the input is the R5 hologram, the proposed R-CNN model can find a perfect focus distance at which R5 contrast is best resolved. By combining two reconstructions at two different distances (for example, R1 and R5), we can obtain the perfect profile of the micro size objects.
A challenging task in passive autofocusing in DHM techniques is the maximization of the phase image's sharpness for studying transparent or semi-transparent objects. This is performed by evaluating the sharpness of amplitude images reconstructed at different planes. The best focus occurs when the object almost disappears or appears with the least counter in the reconstructed  ) and (c) are three holograms for microbeads recorded at unknown distances "ds"; four beads were chosen from each sample for the bead R-CNN model test; all beads are located at the same distance along the z-axis. B1-B4 are the 3D representation of the hologram of the beads. (d), (e) and (f) are three holograms for red blood cells recorded at unknown distances "ds". R1-R5 are the 3D representation of the hologram of the RBCs. R5 is located at different depth with respect to R1-R4. amplitude image. This focal distance corresponds to the best-resolved structures in the quantitative contrast-phase image. The disadvantage is that multiple reconstructions are a time-consuming task due to the digital propagation requiring the application of a Fourier transform (with either a backward propagation method or Fresnel propagation).
In this work, 90 reconstructions were performed, and then the focus-evaluation function was used to examine the 2D standard deviations (STDs) of the pixels within the whole amplitude image with multiple cells or objects. The plane that minimizes the 2D STD of amplitude-image is the best focus plane. The proposed method with no reconstruction can estimate the focus plane of the corresponding object, which can be used to determine the distance between micro size objects along the optical axis. This is very critical for studying samples in 3D environment cultures because cells can freely move along the optical axis during the culture time. The method is fast Fig. 7. (a) Comparison of the R-CNN model output and the focus-evaluation function for estimation of the optimal reconstruction distance. Holograms are recorded corresponding to different unknown "d s " which is manually increased ("d s " is along the optical axis and by moving the stage, "d s " changes); a few bead holograms with multiple beads are shown in Fig. 6. (b, c, d, e) reconstructed phase images by the value provided by R-CNN when the input to model respectively is B1, B2, B3, and B4. The color map is similar for all figures. (f, g, h, i) Correlation analysis between CNN's output (input to the model is B1, B2, B3, and B4, respectively) and focus-evaluation function. (j) Cross section of reconstructed phase image of B1 to B4 when the R-CNN provides "d" for the holograms of B1 to B4. The reconstruction by focus-evaluation function is also presented for comparison. and can instantly find the focus plane without reconstruction. This can significantly enhance the reconstruction time. Table 1 shows the reconstruction time with the focus-evaluation function and with the proposed regression convolutional neural network approach. The reconstruction algorithm for both implementations (one using CNN predication and one using focus-evaluationfunction) is the same and none of them uses parallel processing. Both algorithms are running in MATLAB 2018 in a PC with an Intel-Core-i7 (3.60 Ghz) processor and 16GB of RAM. The main difference between the two algorithms is that one of them calls the trained CNN model and the other one calls focus-evaluation-function. As it is mentioned, the CNN model is trained with Keras in GPU only (NVIDIA Geforce GTX 690) and then the trained model is imported in MATLAB. We found out that reconstruction in 90 different planes can find the best focus in the application of biological samples studies. Dropping number of reconstruction planes may cause losing the best focus-plane.  8. (a) Comparison of the R-CNN model output and the focus-evaluation function for estimation of the optimal reconstruction distance. Holograms are recorded corresponding to different unknown "d s " which is manually increased ("d s " is along the optical axis and by moving the stage, "d s " changes). A few RBC holograms recorded at various "d s " are shown in Fig. 6; R5 is located at a different distance from R1-R4. (b, c, d, e) Correlation analysis between the R-CNN's output and focus-evaluation function when the input to the model is R1, R2, R3, and R4, respectively.  9. Reconstructed phase images when (a) "d" is estimated with focus-evaluation function, (b) "d" is estimated with R-CNN model (input is R1), and (c) "d" is estimated with R-CNN model (input is R5). (d) Combination of (b) and (c); R5 is copied from (c) and inserted into (b); color map is similar for all images. (e) 3D profile of R1 to R5 when the focus is according to the-focus-evaluation function. In this case, R5 is at a different focus level, so the profile is not correct. (f) 3D profile of R1 to R5 when the focus is according to CNN model for R1 as input. R5 is at a different focus level, so the profile is not correct. (g) 3D profile of R1 to R5 when the focus is according to CNN model for R5 as input. R5 profile is correctly reconstructed. (h) Profiles extracted from combination of (c) and (d). One main challenge in designing CNN-based predication is that the model requires lots of samples to be well-trained. In this work we showed that if we separately design models, it is still possible to train a model for accurate reconstruction distance predication. Another reason is that RBC-to-RBC variation and bead-to-bead variation are not significant, thus, two models can accurately predicate reconstruction distance.

Conclusions
In this paper, we propose a deep-learning convolutional neural network with a regression layer as the top layer to estimate the best focus distance in the numerical reconstruction of microsized objects at the single-object level. The focused images and corresponding reconstruction distance for the training data-set are found by using a conventional automated focus-evaluation function. The auto-focus function utilizes 2D standard deviation of reconstructed amplitude image to find the perfect propagation plane. The experimental results and comparison with the focus-evaluation function illustrate that the proposed method can precisely estimate the propagation distance from a filtered hologram. We have demonstrated experimentally that this method can significantly reduce the numerical reconstruction time to estimate the correct focus. The automated focus-evaluation function requires digital propagation at different distances which can be computationally inefficient. Also, since the distance of the object is numerically estimated at the-single-cell level, it can provide reconstruction distances with respect to the location of various micro-size objects. This method can be generalized for biological sample studies and most specifically cancer cells since they are round cells with almost same visual structure. Generalization requires training the model with plenty of cells and a wide range of reconstruction distances.

Funding
National Research Foundation of Korea (NRF-2015K1A1A2029224).

Disclosures
The authors declare that there are no conflicts of interest related to this article.