Automated red blood cells extraction from holographic images using fully convolutional neural networks

: In this paper, we present two models for automatically extracting red blood cells (RBCs) from RBCs holographic images based on a deep learning fully convolutional neural network (FCN) algorithm. The first model, called FCN-1, only uses the FCN algorithm to carry out RBCs prediction, whereas the second model, called FCN-2, combines the FCN approach with the marker-controlled watershed transform segmentation scheme to achieve RBCs extraction. Both models achieve good segmentation accuracy. In addition, the second model has much better performance in terms of cell separation than traditional segmentation methods. In the proposed methods, the RBCs phase images are first numerically reconstructed from RBCs holograms recorded with off-axis digital holographic microscopy. Then, some RBCs phase images are manually segmented and used as training data to fine-tune the FCN. Finally, each pixel in new input RBCs phase images is predicted into either foreground or background using the trained FCN models. The RBCs prediction result from the first model is the final segmentation result, whereas the result from the second model is used as the internal markers of the marker-controlled transform algorithm for further segmentation. Experimental results show that the given schemes can automatically extract RBCs from RBCs phase images and much better RBCs separation results are obtained when the FCN technique is combined with the marker-controlled watershed segmentation algorithm.

In this study, the hologram of RBCs was recorded using DHM and RBCs phase images reconstructed from their holograms using a numerical reconstruction algorithm [21][22]. The RBCs obtained from DHM can provide cell thickness and 3D morphology information that is helpful in RBC quantitative analysis and beneficial to medical diagnosis. In order to conduct further RBCs analysis, determination of specific RBCs in RBCs phase images is essential. Therefore, analyzing RBC-based properties from extracted RBCs would be much more accurate and beneficial to patients. For instance, the number of RBCs is related to patient's health and can be used to investigate hypotheses about pathological processes in clinical pathology, while the cell concentration is very important in molecular biology for adjusting the amount of chemicals applied in experiments. Moreover, it is much easier to identify any abnormality and analyze RBC-related diseases from segmented RBCs images.
However, the task is tedious and time-consuming if the RBCs are segmented and counted manually. Consequently, many automated algorithms have been proposed for RBCs segmentation. Three main kinds of cells segmentation approaches have been presented: region-based, edge-based, and energy-based [23-32]. The RBCs segmentation methods presented in [23-24] are region-based algorithms, whereas those presented in [25-26] and [27-28] are edge-based and energy-based, respectively. However, most of the segmentation methods applied to RBCs images are based on 2D imaging systems, with only a few being based on RBCs phase images obtained via DHM imaging. In addition, most of these techniques cannot segment the RBCs images when multiple RBCs are connected. In our previous work, we combined the marker-controlled watershed algorithm with morphological operations and segmented RBCs phase images obtained using the DHM technique with good results [26]. Nevertheless, the approach proposed in [26] cannot properly segment heavily overlapped and multiple touched RBCs as well. Therefore, developing a more robust algorithm for RBC phase image segmentation is essential for further RBC analysis.
Deep learning is a promising technique that is able to achieve results superior to those obtainable using traditional methods. Consequently, it is extensively studied in the computer vision community [33][34][35][36][37][38][39][40]. Krizhevsky et al. [35] used convolutional neural networks for image classification to very good effect. Mikolov et al. [36] and Liu et al. [37] obtained good performance from recurrent neural networks in text classification and translation. Long et al. [38] proposed a fully convolutional neural network (FCN) for semantic segmentation and obtained surprising outcomes. FCNs have the advantage of end-to-end training and produce pixel-wise prediction. Moreover, the size of the image inputted to an FCN algorithm can be arbitrary, which differs from other image segmentation deep learning algorithms, such as convolutional neural networks [35]. Some other kinds of FCN algorithms such as U-net [33] and SegNet [34] are also proposed for semantic segmentation and applied to biological images. In this study, we apply the FCN technique to RBCs phase images for RBCs segmentation. We develop two RBCs segmentation schemes. In the first scheme, FCN-1, the RBCs phase images and their manually segmented RBCs are used as a true label to train the FCN model. The trained model is then applied to predict RBCs phase image pixels as either foreground (RBCs) or background for RBCs segmentation. In the second scheme named as FCN-2, we combine the FCN model with the marker-controlled watershed transform algorithm to segment the RBCs. In FCN-2, we only use the fully convolutional neural network to predict the inner part of each red blood cell and then regard the predicted results as internal markers of marker-controlled watershed algorithm so as to further segment the RBCs. In the second scheme, the training label image is not the mask of all the segmented cells; it is erosion results of that mask, which represents the inner area of each RBC.
Consequently, we first use a 3D imaging technique called off-axis DHM to record these RBCs and then apply the numerical reconstruction algorithm to reconstruct RBCs phase images from their holograms. Next, two kinds of training images are prepared from RBCs phase images and the FCNs trained for the two different schemes. One of the FCNs is used to predict all of the cells, whereas the other is only used to predict the inside part of each RBC and the predicted results further combine with the marker-controlled watershed method to segment the RBCs. We then compare the segmentation results from the two methods with those obtained using other methods in terms of segmentation accuracy and cell separation ability. Our experimental results indicate that our methods achieve good segmentation results overall, with the FCN-2 model giving the best performance in terms of separation of overlapped RBCs. The remainder of this paper is organized as follows. Section 2 describes the principle underlying off-axis DHM. Section 3 discusses FCNs. Section 4 outlines the RBCs segmentation procedure. Section 5 presents and discusses the experimental results obtained. Section 6 presents concluding remarks.

Off-axis digital holographic microscopy
Off-Axis DHM is a three-dimensional imaging technique that has been researched for application in the area of cell biology, including 3D cell visualization, classification, recognition, and tracking [6-17, [41][42][43][44]. Off-axis DHM, which is also a noninvasive interferometric microscopy technique, provides a quantitative measure of the optical path length. Figure 1 shows the schematic of an off-axis DHM system used to capture the hologram of an imaging target sample. As shown in Fig. 1, off-axis DHM is a modified Mach-Zehnder configuration in which a laser diode source is used in off-axis geometry [45]. Usually, a low intensity laser is used as the light source for target sample illumination in the DHM imaging system (a λ = 682nm laser diode source is utilized in this experiment). In offaxis DHM, the laser beam from the laser diode source is split into object wave and reference wave. Then, the object wave passing through the imaging target sample is diffracted and further magnified by a 40 × /0.75 numerical aperture microscopy objective. Subsequently, a hologram consisting of interference patterns between reference beam and diffracted and magnified object beam in the off-axis geometry is recorded via a charge-coupled device (CCD) camera. As a result, the quantitative phase images are numerically reconstructed from the recorded hologram using a specific numerical algorithm, as described in [21,22]. Thanks to current computing power, the phase images can be reconstructed from the hologram at a speed of 100 images per second, which achieves real-time processing.

Fully convolutional neural networks
Fully convolutional neural networks (FCNs), an extension of convolutional neural networks [35], have become the mainstream algorithm in the field of semantic segmentation since the amazing performance achieved by Long et al. [38]. FCNs have the advantage of training and inferring on images with arbitrary sizes and making pixel-wise prediction for semantic segmentation. They have been attracting increasing attention and have been successfully applied to biomedical images, such as cardiac segmentation in MRI and liver and lesion segmentation in CT, with good results [46,47]. Different from convolutional neural networks, there are no fully connected layers in the FCNs [38]. Figure 2 (Row A) shows the general network architecture of an FCN. The network is constructed with some basic layers, which consist of convolution (conv), pooling (pool), activation, and deconvolution (deConv) [35,38]. Convolution layer which is the convolution operation between image or feature map and a kernel refers to the feature extraction; pooling mainly refers to max pooling in the FCN that results in shrinkage of feature maps in spatial dimension, max pooling has the advantage of leading to faster convergence rate by selecting superior invariant features that can enhance the performance of generalization; activation layer in FCN algorithm mainly refers to the rectified linear units (Relu) [38], which is defined as f(x) = max (0, x), where x is the input value to a neuron. Because an FCN is an end-to-end and pixel-to-pixel training/prediction technique, the FCN output must be the same size as the ground truth image, i.e., the same size as the input image. Consequently, the deconvolutional (deConv) layer is used to map the feature resolution into the same size of input image. The deconvolutional operation is achieved by upsampling the previous coarse output maps followed by convolutional manipulation. Therefore, the FCN can consume an image of arbitrary size and output a dense prediction map of the same size. The local connectivity property of the convolutional, pooling, Relu, and deconvolutional layers also result in FCN having a translation invariant feature [38]. A loss layer is included in the FCN training phase so that the network parameters are learned by minimizing the cost value [38]. Some other layers such as batch normalization, dropout, and softmax are also widely used in FCNs [33,34,38]. Specifically, each layer of data in the FCN is a threedimensional array in size of h × w × d, where h and w are spatial dimensions, and d is the dimension of feature. The basic units in FCN (convolution, pooling, and activation functions) only operate on local input regions and depend on relative spatial coordinates. Assigning x ij for data vector at location (i, j) in a particular layer, and y ij represents the output of this layer or the input of next layer, the y ij is derived by following expression [38]: where k is the kernel size, s is the stride, f ks is the function determined by the layer type that a matrix multiplication for convolutional layer, a spatial max for max pooling layer, or an elementwise nonlinear function such as Relu for an activation layer, an interpolation function followed by matrix multiplication for deconvolutional layer, and so on for other types of layers. The functional form in Eq. (1) is maintained by kernel size and stride satisfying with the following transformation rule [38]: where  represents function composition.  The parameters of an FCN model only exist in the kernel used in the convolutional and deconvolutional layer. Thus, the total number of parameters required for an FCN is much smaller than that for a fully connected deep neural network when the same number of hidden units is utilized. Further, the number of parameters is even smaller than that in convolutional neural networks. The relatively small number of parameters required by an FCN is beneficial in network training. In an FCN, the feed-forward passing through the network provides a dense prediction map and the loss function defined as a sum over the spatial dimensions of the final layer combined with information from the ground truth label image is minimized by the backpropagation algorithm in order to learn the network [48]. That is, the forward direction in an FCN is for inference, whereas the backward direction is for learning.
Following a series of successful application of FCN to semantic segmentation, many new algorithms based on the FCN technique and specific scenarios have been proposed. They are widely studied in the image segmentation, classification, and tracking fields [38,49,50]. Long et al. [38] proposed two other FCN architectures with different upsampling scale to compensate the shortcoming of the main FCN architecture, which requires a total of 32 × upsampling. The other two FCN architectures (FCN-16s and FCN-8s in [38]) fuse the pooling information at different layers and reportedly give significantly better semantic segmentation results than the original. The network architectures of FCN-16s and FCN-8s are also shown in Fig. 2 (Row B is FCN-16s and Row C is FCN-8s). For example, in FCN-8s, the coarse output from the FCN model is first 4 × upsampled and the pool4 image is 2 × upsampled. Then, these upsampled images are fused with the image at the pool3 layer and the fused images are finally 8 × upsampled to obtain the prediction image with the same size as the input image.

RBCs segmentation
In this section, the RBCs phase image segmentation procedure is presented. The RBC hologram is first recorded using off-axis DHM and the corresponding RBCs phase image numerically reconstructed using the numerical algorithm in [21][22]. Training data sets were prepared in order to use the FCNs for RBCs phase image segmentation. We designed two kinds of training data sets for RBCs segmentation using the FCN model. In the first scheme (FCN-1), we manually segmented the RBCs in the RBCs phase image and used the mask of the segmented RBCs phase image as the ground truth label image, in which ones denote the RBCs target and zeros the background. One of the RBCs phase images obtained by off-axis DHM is shown along with the corresponding prepared ground truth label images in Fig. 3. The FCN was trained by minimizing the error defined between the ground truth label image and the prediction image resulting from the FCN inference process. Then, the trained FCN was used to predict the class (0: background, 1: RBCs target) of each pixel in the RBCs phase image. In this approach, the segmented results are viewed as the final RBCs segmentation results because the training data set expresses the entire segmented RBCs. In the second scheme (FCN-2), the ground truth label image only denotes the center part of each RBC in the RBCs phase image. These ground truth label images were obtained conducting morphological erosion [51] on the ground truth label image from the first scheme (FCN-1) with a structuring elements of size seven. One of the ground truth label images used with the FCN-2 model is given in Fig. 3. Consequently, the FCN-2 scheme was trained and used to predict the center part of each RBC. Because this method cannot segment RBCs directly, we combined the FCN model with the marker-controlled watershed transform method for RBCs phase image segmentation. The predicted center part of RBCs from FCN is perfectly viewed as the internal markers of the marker-controlled watershed transform algorithm. Thus, the RBCs phase images were finally segmented using the marker-controlled watershed segmentation algorithm. Flowcharts for the two schemes are presented in Fig. 4.  The original FCN model in [38], which performs max pooling layer five times, is not very robust to small object segmentation [40] due to the large upsampling scale value. In this study, we only used the max pooling layer four times. The proposed FCN structure is given in Fig. 5. As can be seen in the figure, there is no max pooling operation at the second layer and the image size in the pool2 layer is the same as that of the previous layer. Further, the image in the pool5 layer is 4 × upsampled and fused with the 2 × upsampling image at the pool4 layer and the image at the pool3 layer. The final layer is 4 × upsampled from the fused image. The relative small upsampling scale value in the last layer can help to get fine segmentation results. For FCN training, the pre-trained VGG-16 Caffe model [52] was used to initialize the parameters in the two schemes. Here, these parameters within layers that are also existed in the VGG-16 network are initialized with corresponding weight values in pre-trained VGG-16 Caffe model [52] while other parameters are randomly initialized [33][34][35]. Training a deep learning model with pre-trained model is a good strategy to help converge the network while training a network from scratch usually needs more training image and times [38].

Experimental results
All the RBCs analyzed in our experiment were taken from healthy laboratory personnel in the Laboratoire Suisse d' Analyse Du Dopage, CHUV and their holograms recorded with off-axis DHM. The RBCs phase images were reconstructed from these RBCs off-axis holograms using a computational numerical algorithm. One of the reconstructed RBCs phase images is given in Figs. 3(a) and 3(b). We manually segmented 50 RBCs phase images for the training and testing data sets. The size of each RBCs phase image was 700 × 700. To increase the size of the training and testing images, we randomly cropped five images with size 384 × 384 from each 700 × 700 RBCs phase image. The ratio of the training data set to the testing data set was set to 7:3. The corresponding ground truth label images for Fig. 3(b) RBCs phase image in the FCN-1 and FCN-2 models are also shown in Fig. 3. For the FCN training, In this study, the metrics under-separating, over-separating, and encroachment errors were adopted to quantitatively measure the RBCs separation ability of these RBCs phase image segmentation methods. Under-separating is defined as the number of non-separated RBCs for the connected or overlapped RBCs and over-separating signifies the number of RBC divisions within a single non-touching RBC. The encroachment error refers to the number of incorrect RBC separations. The measured values for under-separating, over-separating, and encroachment error for 33 RBCs phase images with 150 overlapped RBC regions and approximately 1000 RBCs are given in Table 2. RBCs separation evaluation curves for the four methods are also shown in Fig. 8. It is clear that the methods proposed in this paper have better separation ability that those by Yi et al. [26] and Yang et al. [55]. Moreover, the FCN-2 method produced the best result in terms of RBC separation ability. This means that combining FCN with the marker-controlled watershed transform algorithm can further improve the segmentation performance.  [55] were 4.67 seconds and 7.83 seconds, respectively. Thus, it is to be noted that our methods achieve superior segmentation accuracy and RBCs separation performance but sacrifice efficiency in terms of computing time. However, as computing power will continue to increase into the foreseeable future, this is not a major problem.

Conclusions
In this study, two models based on FCNs were developed and used for automated RBCs extraction in RBCs phase images numerically reconstructed from digital holograms obtained using off-axis DHM. In the first model, only fully convolutional networks are utilized for the semantic segmentation of RBCs phase images, whereas the second model combines fully convolutional networks with the marker-controlled watershed transform algorithm for RBCs segmentation. The parameters of the FCNs were initialized using a VGG 16-layer net and fine-tuned by manually labeled RBCs phase images in the two models separately. Experimental results show that the two proposed approaches can automatically segment the red blood cells in RBCs phase images. However, connected and overlapped RBCs in RBCs phase images are better handled by the second proposed model. Comparison results reveal that our methods achieve better performance than two other proposed algorithms in terms of RBCs segmentation accuracy and RBCs separation ability for overlapped RBCs. All the individual methods in this paper are already existed whereas it is a total new idea to combine FCNs with marker-controlled watershed transform approach to separate connected RBCs. To the best of our knowledge, it is also the first work to apply deep learning algorithm to the digital holographic RBCs images. The proposed methods are useful for quantitatively analyzing red blood morphology and other features that enable diagnosis of RBC-related diseases, and can be used in a variety of cell identification approaches [56].

Conflicts of Interests
The authors declare that there are no conflicts of interest related to this article.