Full-color optically-sectioned imaging by wide-field microscopy via deep-learning

: Wide-ﬁeld microscopy (WFM) is broadly used in experimental studies of biological specimens. However, combining the out-of-focus signals with the in-focus plane reduces the signal-to-noise ratio (SNR) and axial resolution of the image. Therefore, structured illumination microscopy (SIM) with white light illumination has been used to obtain full-color 3D images, which can capture high SNR optically-sectioned images with improved axial resolution and natural specimen colors. Nevertheless, this full-color SIM (FC-SIM) has a data acquisition burden for 3D-image reconstruction with a shortened depth-of-ﬁeld, especially for thick samples such as insects and large-scale 3D imaging using stitching techniques. In this paper, we propose a deep-learning-based method for full-color WFM, i.e. , FC-WFM-Deep, which can reconstruct high-quality full-color 3D images with an extended optical sectioning capability directly from the FC-WFM z -stack data. Case studies of diﬀerent specimens with a speciﬁc imaging system are used to illustrate this method. Consequently, the image quality achievable with this FC-WFM-Deep method is comparable to the FC-SIM method in terms of 3D information and spatial resolution, while the reconstruction data size is 21-fold smaller and the in-focus depth is doubled. This technique signiﬁcantly reduces the 3D data acquisition requirements without losing detail and improves the 3D imaging speed by extracting the optical sectioning in the depth-of-ﬁeld. This cost-eﬀective and convenient method oﬀers a promising tool to observe high-precision color 3D spatial distributions of biological samples

In this work, we propose a full-color optically-sectioned imaging method with a wide-field microscope via deep-learning, named the FC-WFM-Deep method. The final reconstruction result, which we name the OS in DOF, can be directly acquired from the WF data, thereby extensively reducing the data acquisition load for the 3D reconstruction with full color patterns while maintaining a satisfactory imaging quality, which should be very practical in imaging implication. The effectiveness of the proposed method is verified by the testing dataset and experiments on 3D imaging of insects.
The rest of the paper is organized as follows. Section 2 describes the proposed deep-learningbased FC-WFM method, including the background suppression with limited in-focus detection, and the acquisition of the OS in DOF via deep-learning. Section 3 verifies the effectiveness of the proposed method through experimentation with different insects and provides the evaluation results and the corresponding discussion. Section 4 concludes the paper.

Criterion of the OS in DOF
As shown in Fig. 1(a), the original FC-WFM and FC-SIM can be implemented using a collimated and high-power white light LED (SILIS-3C, Thorlabs Inc.) as the illumination source. The LED light enters the total internal reflected prism (TIR prism) before reflecting onto the DMD (V7000, ViALUX GmbH, Germany). The modulated light then passes through the optical projection system, which includes an achromatic collimating lens, a beam-splitter and an objective lens (20x objective, 0.45 numerical aperture, Nikon Inc., Japan), and projects a sinusoidal fringe pattern onto the specimens [14]. The specimens are mounted on an x-y-z motorized translation stage (Attocube GmbH, Germany). A color CMOS camera (80 fps, 2048×2048 pixels, IDS GmbH, Germany) that is equipped with a tube lens is used to capture the 2D wide-field images. Additionally, timing synchronization to control the DMD, image collection, and stage movement are performed using customized software developed based on C++ code. For each axial plane in z-scanning, three fringe-illuminated raw images with an adjacent phase-shift are captured. The volume data values for the different axial layers are obtained by axial movement of the specimen at different z positions to acquire the 3D light intensity distribution of the specimen. Once imaging of an entire sample with a series of field-of-views has been completed, the large-scale image can also be reconstructed using the image stitching technique. As only four pixels per period of binary pattern are used in the fringe projection by DMD, commonly-used three-step phase-shifting at intervals of 2π/3 cannot be performed. Therefore, alternate three-step phase-shifting at intervals of π/2 is adopted according to the same principle [7]. Consequently, for each slice, three sub-images are acquired, which are denoted as S 0 , S π/2 , and S π , in order.
After acquisition, three raw images with individual phase-shifts are recorded by the color CMOS camera. In contrast with most electronic display products that form images by combining RGB (Red, Green and Blue) light with varying intensities, the data is processed using three HSV (Hue, Saturation, and Value) channels which are spaced separately and recombined into a 3D image with full natural color [13]. HSV is based on the principle of color recognition in human vision in terms of hue, lightness and chroma [20] and imaged data is transformed from RGB color space into HSV color space to obtain the three H, S and V components [9]. For each component, due to the defocusing feature of the high spatial frequency illumination near the focal plane, the merged imaging data can be simply decomposed into two parts, i.e., the in-focus component S in and the out-of-focus component S out . Therefore, the required in-focus information of S in, p can be extracted from the captured image triples according to the following process [7]: where p = H, S, and V represent the three different components. The sectioned images for the three HSV channels are then recombined and transformed back into the RGB space in order to display the images on devices.
Although the FC-SIM has a relatively faster imaging speed than the LSCM, the 3D reconstruction requires a huge number of OS images due to the limited in-focused information extracted by the calculation, which is also demonstrated as the OS capacity [7]. Specifically, the intensity distribution I(z,v) permitted in the field of a microscope (objective), including the focal plane as the starting surface along the defocusing direction of both sides, can be expressed as [10]: where u=8z(π/λ)sin 2 (α/2), α is the half aperture angle of the objective lens, z is the axial range outside the focal plane and λ is the wavelength. J 1 (•) represents the first-order Bessel function and g(v) = 1-1.38v+0.0304v 2 +0.344v 3 . The OS capacity in SIM-OS can then be estimated by the full width at half maximum (FWHM) of the I(z,v) curve, defined as the OS strength [7] for a fixed spatial frequency v. In addition, it also has been proven that at a given NA, the optimal spatial frequency v of the illumination pattern should be half of the cut-off frequency, which can be used to obtain the best OS effect. For the system described above, the specific I(z,v) value was calculated and is shown in Fig. 1(b), where the theoretical average OS strength is approximately 1.83 µm in our system (shown in black arrows in Fig. 1(b)). The FC-WFM image can also be simultaneously obtained with: This shows that the FC-SIM image and the FC-WFM image of the sample can be obtained using the same raw data, which is very convenient for the theoretical analysis and comparison. As mentioned previously, the WF image includes both in-focus and defocus information. In order to provide a quantitative demonstration, the intensity curve of the FC-WFM can be estimated from the experimental data [21]. Ten specific points in a vertical slice were firstly selected, and then their intensities collected in different z-scanning positions were averaged and finally fitted to a Gaussian curve, as shown in Fig. 1(b). Additionally, the DOF of FC-WFM can be determined by [22]: where n is the refractive index of the medium, NA is the objective numerical aperture, e is the pixel pitch of the camera and M is the lateral magnification of the objective. Therefore, the DOF is shown to be approximately 3.45 µm for FC-WFM, which matches the FWHM values shown by the blue dashed curve (approximately 3.59 µm with red arrows in Fig. 1(b)). In other words, the FC-SIM can significantly remove the out-of-focus background, resulting in an in-focus depth that is theoretically reduced from 3.45 µm to 1.86 µm. Therefore, FC-WFM-Deep, a new WFM method based on a deep-learning network, is proposed which makes full use of the most effective intensity contained in a significant depth range of around 3.5 µm, in order to extract the OS in DOF and reduce the data acquisition of imaging requirements.
In order to provide a further explanation of the concepts for the proposed method, a schematic representation of the data acquisition is shown in Fig. 2, where the optical imaging mode of FC-WFM, FC-SIM and the proposed FC-WFM-Deep are presented. The common approach used to image a whole specimen is to take multiple images corresponding to different object planes along the z-scanning direction. For the FC-WFM, outside of the relatively large in-focus depth shown on its intensity curve in Fig. 2(a), each imaging slice in the z-stack is a complex combination of in-focus and out-of-focus components. In contrast, FC-SIM can select a focused area in order to reconstruct an image projection that is sharp everywhere, although the in-focus depth becomes much shorter than for FC-WFM, which means a larger number of slices in the z-stack would be required to reconstruct the 3D result. As shown in Fig. 2(c), after the network has been trained in the FC-WFM-Deep method, it can be used to reconstruct the OS in DOF directly from the WFM slice data, which contains the information from a series of focal stacks of FC-SIM images. This reconstructed slice will have a comparable rejection effect on out-of-focus components and a data-analysis capability similar to FC-SIM imaging, whereas the in-focus depth is equivalent to that of FC-WFM, which implies the data acquisition requirements would be greatly reduced both by the absence of phase-shifting and the restored in-focus depth.

Deep-learning-based FC-WFM
Although our FC-WFM-Deep method can predict the OS in DOF from a single WF image, focal stacks with different depths should be collected for training and validation. These focal stacks, processed by the phase-shifting algorithm [7], were collected over a distance of 3.50 µm distributed symmetrically around the true focal plane. The spacing between each focal plane was 0.50 µm, which was selected to be twice as high as the OS strength to achieve extremely precise 3D reconstruction in our previous studies [7,13]. In fact, in order to extend the maximum number of visible details, several methods have been investigated and their efficiency has been proven to reconstruct composite imaging information [23]. Of these methods, the most popular approach used is multi-focus of colorful microscopy images based on multi-scale decompositions [24]. The basic concept of this method is to perform a multi-scale transformation on each source image, and then integrate all decomposition coefficients selected using a particular criterion to produce a composite representation. The composite OS is then finally reconstructed by performing an inverse multi-scale transform. A discrete Cosine transform (DCT)-based fusion technique for composite information of imaging with multi-focus has been evaluated previously [24] and demonstrated great performance in terms of quality and complexity reduction and is used here. Specifically, the following integration rule is used, which consists of selecting the slice with the largest variance in DCT coefficients at each point: where j=1, 2 . . . , K represents the j-th slice in the stack, O demonstrates the DCT coefficient distribution in a block with center pixels (m,ñ), and σ 2 is the corresponding variance. Consequently, the block with the highest activity level is selected as the appropriate block for the fused image. Finally, for consistency verification, the DCT presentation of the fused image is produced using blocks with larger variances from slices at different focal planes z. Additionally, to avoid saturation and false colors, a vector-to-scalar conversion of the multichannel data was also performed [24]. The operating principle of our method is shown in Fig. 3. Briefly, a CNN is designed to formulate an OS in DOF extension model and this model is trained using a test pair consisting of a WF image and a corresponding fused reference, which both have size 2048×2048 pixels. Since using the whole representation as an input to the CNN model will massively increase both the size of the training data and the computational cost, the training data is split into center-to-center sub-pairs. In fact, many other studies have recently demonstrated that one single image containing enough ensembles can successfully train a CNN model [17]. As information at the central point will be affected by the surrounding areas, the side length of the sub-pair is determined based on the point spread function (PSF) [17]. Taking the imaging of a tiger beetle as an example, the intensity values of images were firstly normalized to the range of [0, 1] for preprocessing. Then, the size of the sub-window can be chosen as 5∼15 times the PSF distribution area based on the imaging approach with spatially rotating PSF [25] and the training and reference images can be divided into pairs with dimensions of 180×180 pixels and a step length of 60 pixels. For more precise results, data from five images with different field-of-views (FOVs) of the tiger beetle were jointly used after practical trials. This allows acquisition of several thousand sub-image pairs, which can provide enough training data for the neural network. All the data we have were partitioned into 80% for training, 10% for validation, and 10% for testing, as depicted in Table 1. Note that all the testing images were used for the evaluation of our deep learning-based method and they were not included in the training dataset.  Once this has been completed, the goal of FC-WFM-Deep is to learn a model f that predicts valuesŶ=f (X), whereŶ denotes an estimate of the target-fused representation Y and X represents the WF data. Rather than directly outputting the predicted images, residual networks can be easily trained to improve their accuracy, as has been achieved in many previous applications [26]. Therefore, the residual network can converge much faster, while showing superior performance to the standard network where the input and output images are highly correlated [27]. Thus, the residual image, R = Y-X, is trained and predicted in FC-WFM-Deep. For FC-WFM-Deep with depth L, there are three types of layers [28]: the convolutional (Conv) layer, the rectified linear unit (ReLU) layer and the batch normalization (BN) layer, which are all shown in Fig. 3(a). Note that convolution operations are accompanied by zero padding so that the output size is the same as the input size. Finally, given T training sub-image pairs, the averaged mean squared error between the required residual images and the estimated images can be adopted using a loss function to learn the trainable parameters Θ in FC-WFM-Deep with: For implementation, adaptive moment estimation (ADAM) was employed as an optimizer [17,28]. The momentum and weight decay parameters were set to 0.9 and 0.0001, respectively. In order to train the network more quickly, the learning rate was initially set to 0.1 and then decreased by a factor of 10 every 30 epochs and stopped after 90 epochs. The training took roughly 8 hours on a GPU Titan V. It has been seen previously that as the depth of the network increases, the performance improves rapidly [27]. Therefore, after many trials, layers of 20 with batches of 128 were chosen as the best model to minimize the loss function with an acceptable data size.
In summary, our FC-WFM-Deep method has two main features: (1) a residual learning formula is adopted to learn f (X) and (2) BN and multi-scale learning rates are incorporated to improve training speeds while providing accurate predictive performance [28]. Finally, as shown in Fig. 3(b), to obtain the OS in DOF, the WF image is input into a well-trained model, which subsequently predicts a corresponding high-quality image with a large in-focus composite range.

Evaluation with the test dataset
A testing dataset containing images of a tiger beetle's eye, which is a bowl-like object, was produced for performance evaluation of all of the imaging methods being tested. To ensure fair comparison, the spacing for each focal plane was set to 0.5 µm for both FC-WFM and FC-SIM in the dataset, as before in previous section. A typical set of results for this dataset is shown in Fig. 4. Specifically, Fig. 4(a) and (b) show the imaging results of a tiger beetle's eye, captured at different imaging depths and illuminated by a white light LED. Compared with the FC-WFM, the OS-based FC-SIM can efficiently suppress the background, however the remaining in-focus component only contains a relatively small area compared to the WF data. In contrast, the imaging result of FC-WFM-Deep provides more information within the OS in DOF and a satisfactory imaging quality that is comparable to that of FC-SIM. Furthermore, in terms of computational time, FC-WFM-Deep achieves a very appealing computational efficiency, e.g. it can process an image of 2048×2048 pixels in about 0.11 s. Moreover, to intuitively and quantitatively evaluate the acquisition of depth information in the FC-WFM-Deep method, the 3D reconstruction for the compounded eye and the corresponding re-slicing images in the x-z direction are shown in the Fig. 5. Specifically, two slices in the x-z direction were randomly selected from the 3D imaging data, which show the details without obvious differences. Furthermore, two profiles across the x-z slices were also randomly selected and compared in Fig. 5(d), (e), (i) and (j). Even though the intensities in the imaging results of FC-SIM are not absolutely identical with that of FC-WFM-Deep method, the slight distinction is acceptable in practice. All these results indicate that the depth information acquired by the FC-WFM-Deep is enough for the 3D reconstruction without details losing.
To demonstrate the data-analysis capability of FC-WFM-Deep, precise measurement data (e.g., length, curvature) for ultrastructural aspects (e.g., punctae, scales, setae) of a typical group in the test dataset was calculated. For instance, the maximum intensity projection (MIP) color images of a compound eye are shown and compared for different methods in Fig. 6. Due to the deterioration of the background, FC-WFM visually shows relatively poor imaging quality. For easy and fair comparison, the reconstruction MIP of FC-WFM used the same number of slices as the FC-SIM results, which are shown in Fig. 6(d). It is obvious that this method can achieve more details of the compounded eye than FC-WFM, which is mainly because of the in-focus component extraction. Finally, in terms of FC-WFM-Deep with OS in DOF, the MIP result is practically identical to that of the FC-SIM. In order to quantitatively confirm this, the reconstruction results were evaluated using the peak-signal-to-noise-ratio (PSNR) [29] and structural similarity (SSIM) [30] indexes.
Specifically, the PSNR and SSIM comparing the proposed FC-WFM-Deep method and the original SIM of the compound eye imaging were found to reach 37.25 dB and 0.95, respectively (The larger the value of both PSNR and SSIM, the smaller the distortion [29]); the results are summarized in Table 2. However, the original SIM method involved 161 slices with 3 step phase-shifting, i.e., 483 processed images were used, compared to only 23 WF images to reconstruct the MIP image with FC-WFM-Deep, i.e., 23 calculated images were used. This is a reduction in data size of a factor of 21. This demonstrates that our proposed FC-WFM-Deep method can achieve an imaging quality comparable to FC-SIM directly using the WF data but with lower data acquisition requirements.  Additionally, as small distortion in the depth information of FC-WFM-Deep that has been witnessed and validated in Fig. 5, regions of interest (ROIs) were randomly selected as shown by the red dashed box in Fig. 6(a), (d), and (g) to quantify the spatial resolution, where the height maps were estimated using the measured intensity data in three dimensions with a height map extraction algorithm [9], as shown in Fig. 6(b), (e) and (h), respectively. It is obvious that since the out-of-focus information severely influences the reconstruction result, the structure of the ommatidia can only be roughly distinguished in the 3D height map of WFM. In contrast, the details become more distinct in the FC-SIM and FC-WFM-Deep maps, which has also been validated by our previous study [9]. The results correspond to the profiles along the white dashed line for the different results, while the WFM profile can merely confirm the number of ommatidia in Fig. 6(c). In contrast, although the profile from the FC-WFM-Deep reconstruction shown in Fig. 6(i) is slightly smoother than the original FC-SIM shown in Fig. 6(f) due to the Gaussian filtering effects in the extraction algorithm, it still indicates that the radius of curvature for a single ommatidia is about 13 µm, while the FC-WFM-Deep result is approximately 1.75 mm for the whole compounded eye, which is also exactly the same as the result for the original FC-SIM. Visualization 1 presents the "3D color images" of the compound eye after 3D reconstruction.
In order to provide a better detailed demonstration of the reconstruction differences between FC-SIM and FC-WFM-Deep, Fig. 7(a) and (b) show the estimated errors of the profile along the white dashed line in Fig. 6 and the entire 3D height map, respectively. It can be seen that the approximations processed by the proposed method have minimal errors with the FC-SIM results and the average divergences of both the selected line and the total plane approach zero. For further quantitative assessment of these estimation errors, the mean squared error (MSE) [31] was chosen as the error measure. The MSEs for the compound eye imaging are calculated and summarized in Table 2. It can be seen that the average MSE of the profile is approximately 2.1×10 −3 while the entire height map is approximately 1.9×10 −3 , which illustrates that the data results from the proposed FC-WFM-Deep method can optimally approximate the analytical results with high resolution

Experiments with a real imaging example
To verify the applicability of our method to different specimens, we imaged two different samples, i.e. a shining leaf chafer (Mimela sp.) and a leaf beetle (Clitenella fulminans, Faldermann, 1835), respectively, and performed reconstructions using the different imaging methods. Considering the different imaging features in individual specimens, the networks were trained separately for each type of sample. The spacing for each focal plane in FC-SIM was also 0.5 µm. For the detailed comparison, we only focused on a small region near the center of an elytron. Figure 8 shows the MIP color images using different imaging methods, which shows that the FC-WFM results are unsatisfactory. The results evidently show that the imaging quality can be greatly enhanced by FC-SIM. However, the FC-WFM-Deep result clearly shows that the elytra in dorsal view are composed of a number of microstructures (punctate) with slight differences in size, shape and color. The presence of such microstructures exerts an indirect influence on their color, resulting in the overall color that we can observe visually. These imaging results illustrate that the FC-WFM-Deep method can exhibit its color imaging capability with less data, i.e., 21 times smaller than the original FC-SIM, without loss of detail.
Additionally, corresponding 3D height maps in the randomly-selected region of interest (shown by the white dashed box in Fig. 8) are also estimated and shown in Fig. 9. As mentioned previously, since the 3D height map for the WFM reconstruction suffers from distortion resulting in a great number of out-of-focus components, only the height maps of FC-SIM and FC-WFM-Deep are compared here so that the conclusion will be more meaningful and easier to interpret. Consequently, the 3D height map estimated from the FC-WFM-Deep result is very similar to that from the FC-SIM, for the imaging of both types of beetles. In terms of the image quality assessment between FC-SIM and FC-WFM-Deep, the results show that the FC-WFM-Deep achieves a PSNR of approximately 40.66 dB with the leaf chafer and approximately 35.80 dB with the leaf beetle. In addition, the SSIMs between the FC-WFM-Deep and the FC-SIM imaging reached to 0.96 and 0.92 respectively for the leaf chafer and leaf beetle, which can be seen in the Table 2 as well. Although the FC-WFM-Deep calculation has only one thirtieth of the number of slices as the original SIM, the bulge and cavity details shown in the height maps validate the effectiveness of the proposed method. Visualization 2 presents the "3D color images" of the leaf chafer after 3D reconstruction.
Similarly, the estimation errors between FC-SIM and our proposed FC-WFM-Deep method are also shown in Fig. 9, which are slightly more intuitive. Except for a few singularities, there are minimal differences between the height maps approximated using the results of these two methods. To quantify this conclusion, the MSEs are also calculated and compared in Table 1.
The average MSE for reconstruction of the height map for these two imaging experiments is only approximately 1.9×10 −3 , thereby our proposed method has practical research value.
All these experimental results demonstrate that FC-WFM-Deep is not a purely academic WFM imaging algorithm but it can handle large-scale imaging data with natural color. A CNN model with an appropriate training dataset plays a key role in many imaging applications. However, the FC-WFM-Deep does have a major limitation. It is currently restricted to the same specimen types and a specific FC-SIM system. In other words, when the imaging condition is changed, e.g. the specimen or the underlying system is adjusted, the FC-WFM-Deep model requires retraining for the changes in imaging parameters. Further obtaining a general network for various conditions is possible in theory and extremely valuable. Nevertheless, training such model needs a huge number of samples with different characteristics and significantly increases the training complexity and cost. In the future research, the FC-WFM-Deep will be extended to handle complex targets and adapted to flexible systems.

Conclusion
In summary, we have presented a residual-learning framework for high quality FC-WFM-based imaging reconstruction. The proposed FC-WFM-Deep architecture fully exploits the unique high resolution and full-color capability of FC-SIM and can be trained using a single FC-WF frame and then generalized with a FC-WFM experiment. The results show that high-quality images with OS in DOF and full-color can be directly acquired from the WF image. The images have a comparable imaging quality to FC-SIM imaging in terms of 3D reconstruction and data-analysis capability with spatial resolution and dimensions. However, the advantage to our method is that the data requirement in reconstruction of each 3D color image can be 21-fold less than the FC-SIM for specific specimens in a certain system. Overall, this technique significantly improves the imaging throughput of an imaging system by extracting OS in DOF and reducing data acquisition without loss of detail. The technique has the potential for wide applications in WFM to gather large-scale spatial and temporal information in a storage-and computation-efficient manner.