Adaptive frequency filtering based on convolutional neural networks in off-axis digital holographic microscopy

: Digital holographic microscopy (DHM) as a label-free quantitative imaging tool has been widely used to investigate the morphology of living cells dynamically. In the off-axis DHM, the spatial filtering in the frequency spectrum of the hologram is vital to the quality of the reconstructed images. In this paper, we propose an adaptive spatial filtering approach based on convolutional neural networks (CNN) to automatically extracts the optimal shape of frequency components. For achieving robust and precise recognition performance, the net model is trained by using the tens of thousands of frequency spectrums with a variety of specimens and imaging conditions. The experimental results demonstrate that the trained network produce an adaptive spatial filtering window which can accurately select the frequency components of the object term and eliminate the frequency components of the interference terms, especially the coherent noise that overlaps with the object term in the spatial frequency domain. We find that the proposed approach has a fast, robust, and outstanding frequency filtering capability without any manual intervention and initial input parameters compared to previous techniques. Furthermore, the applicability of the proposed method in off-axis DHM for dynamic analysis is demonstrated by real-time monitoring the morphologic changes of living MLO-Y4 cells that are constantly subject to Fluid Shear Stress (FSS).


Introduction
Digital holographic microscopy (DHM) is a quantitative imaging technique, which allows measuring in amplitude and phase, the wavefront transmitted through a specimen and magnified by a microscope objective [1,2]. A hologram, composed by the interference of the object wave with a reference wave, is recorded with a camera and then numerically reconstruct the object wave. Thanks to its interferometric nature, DHM can provide phase images of living biological cells with a corresponding accuracy in the nanometer range along the optical axis of the microscope and without using any labels or contrast chemicals. In addition, dynamic imaging in real-time is easily achieved by off-axis DHM owing to the actual performance of digital cameras and personal computers. By taking advantage of the high spatial-temporal resolution of quantitative phase imaging, off-axis DHM is nowadays widely used in the biological field, such as determination of cellular motility [3,4], morphology [5] and biomechanics [6,7]. It is mainly due to the numerical reconstruction that offers high flexibility and provides the possibilities of quantitative analysis. This process consists of propagating a wavefront numerically from the recording plane to the imaging plane, including the hologram apodization, spatial filtering, numerical propagation, aberration compensation and phase unwrapping [8][9][10].
Through the spatial filtering, the 0 term and −1 term are eliminated in the frequency spectrum of the off-axis hologram, and then the object wavefront is isolated to reconstruct numerically by using + 1 term. So, the frequency filtering technique is an important step for off-axis DHM to enhance the contrast of the reconstructed image [11,12]. This operation includes multiplying the frequency spectrum by a so-called filter window, which acts as a band-pass spatial filter such as a circular filter [13] and rectangular filter [14]. The distribution of spatial frequencies varies significantly with the different samples. Especially, during dynamic measurement, the frequency spectrum gradually changes with the states of the living cells at different times. So it is necessary to change the filter windows correspondingly in long-term and real-time microscopic imaging. However, there is a huge amount of work to do if filtering thousands of frequency spectrums by manual. Therefore, an adaptive spatial filtering technique is highly necessary for dynamic imaging in off-axis DHM.
So far, the researchers have made a lot of efforts on automatic spatial filtering method. Rincon et al. presented an algorithm to calculate and locate the filter windows by using a distance criterion between the maximum values in the amplitude spectrum [15]. Weng et al. applied the histogram analysis of the frequency spectrum to decide the parameter threshold and design a spatial filter window [16]. Li et al. proposed an adaptive spatial filtering method to extract frequency components of + 1 term on basis of region growing and the characteristics of the spectrum separation [17]. Matrecano et al. presented a numerical procedure based on the Butterworth filter to produce the adaptive filter window on consideration of the background noise in the hologram [18]. Lee et al. proposed an adaptive filtering process based on iterative thresholding and region-based selection without fixing a single threshold level and setting any initial input parameter [19]. Hong et al. presented a weighted adaptive spatial filtering method combing the advantages of adaptive filter and smoothing filter [20]. Unfortunately, these methods have the limitation of flexibility for selecting the optimal frequency components and eliminating the coherent noise. For example, the parasitic interference fringe would appear in the hologram when these beams produced by multiple diffractions or reflections in the experimental system can reach the hologram plane with an incidence angle similar to the object wave.
Recently, deep learning has been introduced to reconstruction processes of digital holography, such as autofocusing [21], phase aberration compensations [22], etc. The technique can realize adaptation and full automation because it takes pre-labeled results as a guide to learn the intrinsic features of the data set. In this paper, we propose a novel spatial filtering approach based on convolutional neural networks (CNN) to determine an optimal filtering window for off-axis DHM. The numerical aperture of imaging system and the overlapping area of + 1 term and 0 term are considered in generating filtering windows. Therefore, the most object spectrum can be extracted and the details are preserved in the reconstructed image. Meanwhile, the proposed method can identify and remove the frequency components of coherent noise from the filtered spectrum. Those noises maybe come from multiple reflection or random diffraction. Compared with the existing filtering methods, the proposed method can produce an optimal filter window, which not only extracts the most frequency components accurately but also eliminate the coherent noise effectively. For achieving such characteristics, tens of thousands of frequency spectrums of off-axis holograms are used to train the CNN model. These training data come from the off-axis holograms with a variety of specimens and recording conditions. With the trained CNN model, the spatial filtering can operate full automatically and adaptively without any manual intervention and initial input parameter. Through the application of the proposed method to real-time measurement of the morphologic changes of living MLO-Y4 cells that constantly subject to Fluid Shear Stress (FSS), the experimental results demonstrate that the proposed method has an automatic, fast, robust, and outstanding frequency filtering capability.

Principle of frequency domain filtering
In off-axis DHM, the hologram is generated by the interference of the object wave O and the reference wave R with a certain small angle and is expressed as: where * represents the conjugate. The   Figure 1 shows an off-axis DHM setup in the vertical transmission mode suited for biological sample analyses. A 532 nm laser (Cobolt, Sweden) is utilized as the source with an output power of 100 mW. A neutral density filter (NDF) controls the input intensity of the laser. The laser beam is expanded to a plane wave through the spatial filter (SF), and then split into an object beam and a reference beam by a polarizing beam splitter (PBS). A halfwave plate (λ/2) is placed in conjunction with the PBS to adjust the intensity ratio of the two beams. The object beam illuminates the biological specimen after converged by the condenser lens (CL). A microscope objective (Olympus UPLFLN, 20x, NA = 0.50) combined with a tube lens (TL) is used to produce a magnified image. In the reference arm, another half-wave plate is used to adjust the polarization direction for ensuring a high-fringe contrast. The beam splitter (BS) placed in front of the CCD camera (1024 × 1024 pixels, 5.86 μm, PointGrey, Canada) combines the two beams with a slight angle, and the CCD camera records the holograms.
After performing the Fourier transform (FT), the frequency spectrum of the hologram contains two symmetrical areas can be written as: where FT{} denotes the Fourier transform, 1 A represent the 0 term, which forms a bright blob at the center. 2 3 + A A are the + 1 term (object term) and −1 term (conjugate term). A 2 is the desired term contain the object wave O. A proper filter H is employed to extract the desired term 2 A : Then, applying an inverse Fourier transform (FT −1 ) on the extracted frequency spectrum 2 A , a complex function is obtained and can be used to retrieve the object wave with the reconstruction algorithms [23,24]. Here the desired spectrum of the object wave at the image plane can be formulated as: where z is the distance between the recording plane and imaging plane. Subsequently, the object image can be obtained by applying the inverse Fourier transform as:

Filter window shape
Generally, most cells are smooth and homogeneous. So the frequency spectrum of its hologram shows a continuous cross bright spot, and brightness gradually decays from the center. Figure 2(a) shows the digital hologram of living osteocyte cells. Figure 2(b) is the frequency spectrum of the hologram. It can be seen that the energy of the frequency components is mainly concentrated in the low-frequency region. Actually, the frequency components in the high-frequency region include the information of details as well as the coherent noise, such as indicated with two red circles in Fig. 2 (b). To get more microstructure information on cell morphology. The filtering window H should include the most frequency components of + 1 term and remove the frequency components of coherent noise. The red outline in Fig. 2(b) is a filtering window determined manually according to the frequency spectrum distribution of the + 1 term. While the blue outline is another filtering window by a typical threshold method based on region recognition algorithm [19]. The reconstructed images in Fig. 2(c) and in Fig. 2(d) correspond to the manual filter window and the threshold filter window respectively. By Comparing the zoom-in images in Fig. 2(e) and Fig. 2(f), it is obvious that more details of cell structure are achieved by manual filter window. Furthermore, it can be found that the filter window by manual selection cannot be described with a deterministic function. Because the frequency components of + 1 term are filtered by an irregular shape window, and the coherent noise overlaps with the + 1 term in the spatial frequency domain. In fact, it is impossible to generate such a filter window with a non-continuous area for the traditional filtering methods.

Convolution Neural Network filter method
Here we utilized a CNN model to separate the + 1 term and eliminate the interference term and the coherent noise in the spectrum. The schematic diagram is described in Fig. 3. First, the raw frequency spectrum is obtained by FT, as shown in Fig. 3(b). Second, the 0 term in the frequency spectrum is eliminated and cropped half to avoid disturbance of conjugate item, as shown in Fig. 3(e). Then, the processed frequency spectrum is feed into a trained CNN model, and a binary filter window is generated, as shown in Fig. 3(e) and Fig. 3(d). Next, the filter window is used to extract the frequency component of + 1 term and eliminate the frequency components of the interference term and the coherent noise, as shown in Fig. 3(e). Finally, the object image is obtained by centering the filtered spectrum and using the angular spectrum algorithm [19], as shown in Fig. 3(f) and Fig. 3(g).

Data set
To build a data set for net model training, the holograms of living cells are recorded by an off-axis DHM. The corresponding ground truth (label images) are the binary images (0 for background, 1 for interest region) that are the optimal filtering window and generated by manual selection. Actually, the filtering windows are produced manually on the basis of the numerical aperture of the imaging system, the overlapping area of + 1 term and 0 term, and the distribution of coherent noise in the hologram spectrum. Thus, not only the most object frequency components can be included in the filter windows but also the noise frequency components are removed from the filtered spectrum. Furthermore, the training data augmentation is performed by multiplying the data sets with flipping (horizontally and vertically) and rotating (180°) or by using a fuzzy algorithm and median filtering, to enhance the robustness of the network. Finally, there are the 6684 pairs of input images and label images for network training. The eighty percent of these data are randomly selected for training, and the rest are used for validation.

Training
The CNN model used in this paper is the U-Net model [25,26], as shown in Fig. 4. It contains multiple CNN layers, max-pooling layers, unpooling layers with rectified linear unit (ReLU) activation function, and batch normalization (BN) function [27,28]. The volume of input data and output data along with the ground truth images have a size of (batchSize × imageWidth × imageHeight × channel), where batchSize is the number of images in each training session.
The input volume has a size of (1 × 1024 × 512 × 1) (1 channel indicates a grayscale image), whereas the output volume has a size of (1 × 1024 × 512 × 2) (2 channels for 2 classes obtained from the one-hot-encoding of the ground truth images [29]). The CNN model has a symmetrical architecture and consist of two parts: Down-sampling (left half of Fig. 4) and upsampling (right half of Fig. 4). ReLU activation function [27] and BN function [28] are employed after each convolutional layer to capture nonlinearity in data and speed up training.
In the down-sampling path, convolution realizes the feature extraction, which transforms the input image into a multi-dimensional feature representation [30,31]. On the other hand, the up-sampling path is a shape generator that produces object segmentation from the extracted features obtained from the convolution path.
The input images and their corresponding label images are used to train the network with the Stochastic Gradient Descent (SGD) implementation [32]. The training algorithm is performed by iterating the process of feeding the frequency spectrum in batches through the net model and using an optimizer to minimize the error between the output produced by the system and the label [33]. The loss function is measured by using the cross-entropy function with L1 regularization as a penalty factor. Learning rate is initially set as 1e-4, and the decay rate is 0.95 [34]. Other parameters are listed below: dropout rate of 0.9, batchsize of 1, the image size of 1024 × 512, depth channel of 32 at the first layer, the deepest channel is 512 and training with 15000 steps. This net model is implemented in Python by using TensorFlow framework [35] and the implementation was GPU-accelerated with NVIDIA GeForce 1080 Ti graphics card. Figure 5 shows the training loss obtained after 15000 steps, which represents the parameters are updated after each training epoch. The blue line is a precision curve corresponding to the right direct-axis which represents the process of accuracy rising during training. While the red line corresponding to the left direct-axis represents the process of loss decreasing. The results suggest that the loss value decreases quickly (i.e. learned quickly) during the first 4000 of the training and the accuracy value increase with random oscillations (i.e. transitory period) in the first 2000 steps.   Figure 6 shows exemplary channels of selected layers in the down-sampled (top row) and up-sampled (bottom row) paths (see Fig. 4), respectively, for visualization effect. The first image on the top row used as input is the raw frequency spectrum. The second, third, and fourth images on the top row are the outputs of consecutive down-sampling layers respectively. The five images on the lower row are the outputs of the up-sampled layers, and the binary mask on the right is the final output map. The down-sampled layers contain the strong features of the image such as the parabolic intensities, and edges, etc. Meanwhile, the up-sampled layers contain the features of the filter windows.

Experimental testing and results
The proposed method is tested by using the experimental data that do not be used in training compared with the traditional filtering methods. Figure 7 shows the results of spatial filtering by using the trained model. The first column of Fig. 7 is the frequency spectrum of different cells, including the endometrial carcinoma cells in Fig. 7(a), the osteoblastic MLO-Y4 cells in Fig. 7(d), and the ovarian cancer cells in Fig. 7(g). It can be seen that these frequency spectrums have different patterns due to different kinds of cell and experiment condition. In Fig. 7(a), the distance between DC terms and object term is relatively closer than that of Fig.  7 (d) and Fig. 7(g). In Fig. 7(d) and Fig. 7(g), the frequency components of parasitic noise are obvious in the high-frequency region and may be due to the multiple reflections on the surface of Petri dishes, culture medium, or other optical components. The second column of Fig. 7 present the output of the filter window from trained CNN model, as shown in Fig. 7(b), Fig. 7(e) and Fig. 7(h). It can be seen that these filter windows have the diverse irregular shapes to keep away from the interference terms and remove the parasitic noise overlapped with the object term. The third column of Fig. 7 show the reconstructed phase images with fine details of cell morphology.
To illustrate the effectiveness and adaptability, the proposed Deep-learning Filter (F DL ) is compared with the existing filtering methods. Existing filtering methods are mainly divided into four categories: First, the Histogram-analysis Filter (F HA ) by employing the histogram analysis of the spectrum to determine a threshold; Second, the Butterworth Filter (F BW ) by using the Butterworth filter with smoothing effect to produce a filter window; Third, the Region-recognition Filter (F RR ) by combining image segmentation method on basis of the characteristic of the spectrum separation; Fourth, the Manual-selection Filter (F MS )by generating a filtering window manually on consideration of the numerical aperture. All methods are used to deal with the same hologram of ovarian cancer cells, as shown in Fig.  8(a). The magnified portions of the region indicated by the blue rectangle are given in Fig.  8(b). The corresponding frequency spectrum of the hologram is shown in Fig. 8(c) with eliminating the 0 term. The magnified + 1 term indicated by the blue rectangle is displayed in Fig. 8(d). In Fig. 8(e), the Histogram-analysis Filter (F HA ) is generated with the threshold selected by employing the histogram analysis [18] and the filtered spectrum is displayed with the same attenuation in all directions. In Fig. 8(f), the Butterworth filter (F BW ) is designed with the cut-off frequency determined by an image segmentation algorithm [20] and the filtered spectrum is given with smooth edge. In Fig. 8(g), the Region-recognition Filter (F RR ) is built with shape recognition of the spectrum and iterative calculation of threshold [21] and the shape of filtered spectrum is consistent with the energy distribution of + 1 term. In Fig.  8(h), the rounded Manual-selection Filter (F MS ) is produced with a center located on the peak point within + 1 term, and a radius on consideration of the numerical aperture of imaging system [13] and the filtered spectrum is given with circle shape. In Fig. 8(i), the Deeplearning Filter (F DL ) is produced with the trained model and the filtered spectrum has an irregular and discontinuous shape corresponding to the numerical aperture of the imaging system and coherent noise distribution. The reconstructed phase images are shown in Fig.  8(j)-(n) corresponding to the filtered spectrum in Fig. 8(e)-(i), respectively. The zoom-in images indicated by the white rectangle in these phase images are displayed in Fig. 8(o)-(s). The comparisons indicate that the boundary between the nucleus and cytoplasmic is clearly observed in Fig. 8(r) and Fig. 8(s) and that is a blur in Fig. 8(o), Fig. 8(p), and Fig. 8(q). This is because the F MS and F DL extract more frequency components than the F HA , F BW , and F RR . Furthermore, the standard deviation (SD) of the same area indicated by a yellow rectangle in Fig. 8(j) is employed to estimate the phase noise level of these filtering methods. The SD is 0.1535, 0.1259, 0.1212, 0.2002, and 0.1535 for the F HA , F BW , F RR , F MC, and F DL , respectively. It can be seen that the F RR is at the lowest noise level and the F MC is at the highest noise level. The noise level of F DL is 30% more than that of F RR in this case where the window area of F DL is several times that of F RR . Meanwhile, the noise level of F DL is 53% lower than that of F MC with similar window area. It is clear that the phase distribution of the background Fig. 8(s) (corresponding to F DL ) is smoother than that in Fig. 8(r) (corresponding F MC ) thanks to the coherent noise eliminated by the F DL . The experiment results indicate that the proposed method can generate an optimal filtering window that not only can extract the most frequency components accurately and eliminate the coherent noise in the spectrum effectively.

Dynamic cell morphology analysis under Fluid Shear Stress
In order to verify the applicability of the proposed methods in dynamic analysis, the experiment of monitoring the morphologic changes of living MLO-Y4 cells subjected to Fluid Shear Stress (FSS) is demonstrated by setting up a fluid shear module (shown in Fig. 9) in the DHM system (shown in Fig. 1). In this module, the peristaltic pump can drive the fluid at a stable flow rate and provide continuous and stable FSS on the cells. In general, bone cells act as load-bearing cells of the body. The bone tissue is sparsely porous, and this feature makes the bone cells subject to FSS of the tissue fluid. Understanding the cellular responses to mechanical stimulation can help identify signaling pathways and cellular interactions of bone homeostasis, which can help to develop new therapies for bone disorders.
In this measurement, the holograms are recorded for 50 minutes with 1-second interval. The cells keep still for first 10 min without FSS. Then, the stable FSS is applied to cells for 30 minutes and the direction of FSS is from the top right to bottom left, as indicated with the arrow in Fig. 10(a). Finally, the cells keep still again for 10 minutes without FSS, in which the cells adopt a self-control strategy. Figure 10 shows the change process of cell morphology under the FSS at the 10th minute and the 40th minute, respectively. Comparing Fig. 10(a) with Fig. 10(b), it can be found that the profile of cell edge changes significantly after 30 minutes of FSS. In detail, there is a depression and shrinkage of the cell edge and the cell volume becomes smaller meanwhile.   For quantitatively and intuitively evaluating the cellular morphologic variation under FSS, the cell-1 indicated in Fig. 10 is selected to make a profile for showing the horizontal displacement. Figure 11(a), Fig. 11(b) and Fig. 11(c) present the phase images of cells at the beginning and at the end of FSS. The corresponding profiles on the red line (indicated in Fig.  11(a), Fig. 11(b) and Fig. 11(c)), are illustrated in Fig. 11(d), Fig. 11(e) and Fig. 11(f) where the red curve represents the cell morphology at the initial time and the blue curve represents the results at the end time. Figure 11 (a) and Fig. 11(d) shows the cell morphology in the first 10 minutes without FSS and there is no difference between the two curves. Figure 11(b) and Fig. 11(e) shows the cell morphology after loading FSS for 30 minutes and there is about 20 pixels displacement between the two curves. Figure 11(c) and Fig. 11(f) show the cell morphology after removing FSS for 10 minutes and two curves almost coincide.
Base on the experimental results, it is demonstrated that the high-quality reconstructions can be obtained in dynamic off-axis DHM with adaptive spatial filtering by using the trained CNN net. On the other hand, it can be known that the fluid shear stress has an obvious effect on the bone cells, which leads to the change of bone cell's morphology and activity of the bone cells.

Conclusion
The spatial filtering of the frequency spectrum is an important step in the digital reconstruction of off-axis DHM. The quality of reconstructions is effectively enhanced by eliminating the 0 term, the −1 term and the coherent noise in the frequency spectrum. As to dynamic off-axis DHM, an adaptive spatial filtering becomes especially necessary, because the filter windows need to be automatically adjusted according to the varying frequency spectrums. In this paper, we propose an adaptive spatial filtering approach based on CNN network to extract the desired frequency components and eliminate the coherent noise without manual operation or pre-knowledge. For achieving robust and precise frequency filtering capability, the CNN model is trained by using the tens of thousands of frequency spectrums from a variety of specimens and imaging conditions. The test results indicate that the proposed method has the fast, robust, and outstanding performance compared to other automatic filtering methods, such as optimal threshold and regular circular window. Furthermore, the proposed method based on the trained network is applied to dynamically observe the morphologic changes of living MLO-Y4 cells that are subject to FSS by off-axis DHM. The experimental results demonstrated that the high-quality phase images could be obtained with an adaptive spatial filtering window generated by the trained CNN network. The effect of FSS on the bone cells has been investigated by quantitatively analyzing the change of cells morphology. The proposed method can effectively improve the full automation of off-axis DHM and provide potential flexibility for dynamic imaging analysis.

Disclosures
The authors declare that there are no conflicts of interest related to this article.