Deep Learning for an Automated Image-Based Stem Cell Classification

Hematopoiesis is a process in which hematopoietic stem cells produce other mature blood cells in the bone marrow through cell proliferation and differentiation. The hematopoietic cells are cultured on a petri dish to form a different colony-forming unit (CFU). The idea is to identify the type of CFU produced by the stem cell. Several software has been developed to classify the CFU automatically. However, an automated identification or classification of CFU types has become the main challenge. Most of the current software has common drawbacks, such as the expensive operating cost and complex machines. The purpose of this study is to investigate several selected convolutional neural network (CNN) pre-trained models to overcome these constraints for automated CFU classification. Prior to CFU classification, the images are acquired from mouse stem cells and categorized into three types which are CFU-erythroid (E), CFU-granulocyte/macrophage (GM) and CFU-PreB. These images are then pre-processed before being fed into CNN pre-trained models. The models adopt a deep learning neural network approach to extract informative features from the CFU images Classification performance shows that the models integrated with the pre-processing module can classify the CFUs with high accuracies and shorter computational time with 96.33% on 61 minutes and 37 seconds, respectively. Hence, this work finding could be used as the baseline reference for further research.


INTRODUCTION
Hematopoietic stem cells (HSCs) are stem cells that produce other mature blood cells.The process of producing these cells is called hematopoiesis and this process occurs in the bone marrow.Research on hematopoietic stem cells is important because the colonies formed have important stem feature information to assess the therapeutic potential of HSC and its stem value.Among the benefits of hematopoietic stem cells is their ability to renew themselves.This ability makes HSC very useful in the treatment of various hematopoietic diseases.The therapeutic potential of HSC has been used to regenerate hematopoietic systems through bone marrow transplant procedures (Mayle et al. 2013).
HSC cultured in a suitable semi-solid matrix will form colony-forming units (CFUs).These CFUs proliferate and differentiate to form discrete cell groups or colonies that contain recognizable progeny such as CFU-E (Colonyforming unit-erythrocyte), CFU-GEMM (Colony-forming unit-granulocyte/erythrocyte/monocyte/megakaryocyte), CFU-GM (Colony-forming unit-granulocyte/macrophage), BFU-E (Burst-forming unit-erythroid) and CFU-Pre-B (Colony-forming unit-Pre-B).It is important to identify these colonies to recognize their potential.However, the conventional methods used to classify the colony-forming units are less accurate and require special expertise (Stemcell Technologies 2012).
Thus, with the advance in Artificial Intelligence (AI) field, machine learning techniques may be adopted for automated numeration and classification of these CFUs.Machine learning is a set of algorithms that can learn and recognize the patterns or objects from the data provided.Therefore, machine learning can make accurate predictions for newly inserted data.Machine learning can be used as a very important tool to overcome challenges in computer vision such as object recognition and medical imaging.In recent decades, machine learning has been used in many applications around the world and successfully solved many AI problems (Lee 2010).
Deep learning is an extension algorithm from machine learning that uses complex architecture and the structure of deep learning consists of multiple layers used for hierarchical features extraction (Schmidhuber 2015).A deep neural network is a network that can be extended by adding new layers consisting of multiple units and the parameters of each layer can be trained (Bengio & Lecun 2007).
An approach to counting and classification colonies manually is by using a grid.Less than 30 colonies on the Petri dish indicate that the culture is insufficient and therefore may cause an inaccurate assessment of progenitor content in the sample while more than 150 colonies on the Petri dish reduce the individual's ability to recognize the colony.A focused concentration is needed to identify all the colonies and therefore manual technique requires a long time to classify colonies (Pereira et al. 2007).
In biological research, automated stem cell segmentation, counting and classification are important because of the large set of images.An algorithm has to be developed to segment the colonies from unwanted backgrounds automatically so that CFU boundaries are obtained.Informative features should also be extracted for counting and classification purpose.Several software has been successfully implemented which are capable of classifying colony-formation unit (CFU) (Khan et al. 2018).
This research proposes to adapt the latest technology namely deep learning to classify CFUs automatically.Through the development of this automated classification system, the identification of CFUs can be done more effectively and efficiently.

RELATED WORK
Few MATLAB-based software such as CHiTA and NICE has been developed but these software could not identify the colonies correctly due to their limitation (Bewes et al. 2008).On the other hand, CellProfiler requires the user to modify the parameter values, numbers and step sequence from time to time which is very troublesome (Carpenter et al. 2006).The most recent software used for CFUs research is STEMvision and STEMvision has replaced the need to calculate the colony manually by using a microscope.This software is limited to counting colonies only and cannot classify the colony-forming units (Halpenny et al. 2015).Another software that has been developed is OpenCFU (Geissmann 2013), but it is likely to miscalculate the number of colonies, approximately 3:1 of the inner and outer cell (Khan et al. 2018).
Based on the literature study, there are limited research work that related to stem cell classification using deep learning approaches.In 2021, a predictive identification of neural stem cell has been proposed by developing deep neural network using inception model.The proposed architecture has achieved 92.3% performance accuracy to classify three classes of neural stem cell (Zhu et al. 2021).On the next year, an image classification of five CFU classes has been proposed through region of interest localization using CNN pre-trained networks and gradient class activation mapping (gradCAM) (Zamani et al. 2022).This previous work deployed four networks and evaluated with sensitivity performance.DarkNet19 network has achieved the highest sensitivity (87.5%) compared to two networks are used as in this research work, which are AlexNet and GoggleNet.The outcome of previous work is quite low for optimum performance using deep neural network because of the limitation of dataset.Therefore, this work has proposed with increment of dataset compared to previous work to obtain optimal performance of CFU classification.

DEEP NEURAL NETWORK
Recently, deep neural network (DNN) has performed excellently in image classification tasks and DNN is the most powerful deep learning architecture.DNNs have showcased a huge difference from traditional approaches in terms of accuracy and computation time.Firstly, they are deep architecture that has the ability to learn more complex models than shallow ones (Szegedy et al. 2013).The depth of a deep neural network increases in size by a generic procedure of adding and training one or more layers, until it can make good predictions on a new set of .(data(Bengio 2009 A convolutional neural network (CNN) is a class of deep neural networks and has become an efficient tool for solving pattern recognition problems.CNN architecture typically consists of convolutional layers that are fully connected with pooling layers to extract essential features from the image and the fully connected layer is used as a classifier.There are some challenges in using CNN such as a very large labelled dataset for training and classification is needed which not always available (Ozsert Yigit & Ozyildirim 2017).

METHODOLOGY
This research proposes an approach to classify the type of CFU automatically based on the deep neural network.The first step is to acquire data to be used in this study, i.e., the CFUs images.Next, the images are pre-processed using digital image processing techniques to investigate the effects of pre-processing steps in classifying CFU using the deep neural network models.Three pre-trained DNN models are used to assess and compare the performance of the classification of both original and pre-processed CFUs images.The flow chart in Figure 1 shows the overview workflow throughout this research.

PRE-PROCESSING
The image pre-processing process is important to improve and enhance image quality.Additionally, through data augmentation in pre-processing, the number of images can be increased which helps in overcoming the overfitting problem in the training process.The data augmentation process increases the number of images by generating additional synthetic images with slight modifications to the existing dataset, such as reflection, rotation, and translation.
Image cropping -is performed to eliminate unwanted objects or backgrounds from the images.In this research, image cropping is used to highlight or isolate the colonies from the background.The cropped image should include the colony under consideration with very thin padding of the background to allow the deep neural network to detect the edge of the desired object.The size of the cropped image is standardized to pixels.However, the cropped images are too large and do not fit the input size of the deep neural network.Therefore, the cropped image needs to be rescaled to the desired size based on the architecture of the deep neural network that is used to ensure that the training phase runs smoothly.
Image enhancement is the process where the images are enhanced to make it easier to identify the prominent features of the images.In this research, the contrast of the image is enhanced using the contrast-limited adaptive histogram equalization (CLAHE) technique.CLAHE is an improved type of adaptive histogram equalization (AHE) technique which to improve the image pixels transformation from general histogram equalization (Kuran et al. 2022;Zheng et al. 2019), incorporated with the neighborhood image pixel.However, the transformation of image pixels using AHE could be overamplified due to high contrast from homogeneous neighborhood pixels region.Thus, CLAHE has set the contrast limit of amplification to reduce the image noise.CLAHE operates on a local area in an image called tiles rather than the whole image.

CLASSIFICATION USING CNN MODELS
CNNs are trained using a variety of images and learn the essential features of these multiple images.In this research, a pre-trained CNN is used as a feature extractor to utilize the power of CNN without spending time and effort on training.Therefore, the classification layer is modified to fit with three classes of CFUs.The optimum selection of parameters is very important when determining training options for the network to ensure high accuracy.
Three different models are used in this research, which are AlexNet, GoogleNet and ResNet-18.The selected models are based on the first arising CNN model which is AlexNet, followed by two different CNN architectures by adapting the inception model by GoogleNet and additional residual convolutional layers by ResNet-18.The performance of these three models in identifying CFU types is evaluated based on their precision and accuracy.The AlexNet architecture as shown in Figure 3 consists of five convolutional layers and three fully connected layers.The AlexNet model uses the dropout technique to reduce overfitting if the dataset is small.The problem can occur when the trained features are too excessively over fit with the training data (Mohamad Zamri et al. 2021).
GoogleNet has achieved a top-5 error rate of 6.67% which is approximately the same level as human performance (Szegedy et al. 2014).The GoogleNet architecture consists of 22 layers, but the number of parameters is reduced by almost 12 times compared with AlexNet.Meanwhile, ResNet-18 has shortcut connections that are parallel to their convolutional layers which carry important information from the previous layer to the next layer.These shortcut connections allow for faster training and ResNet-18 architecture is shown in Figure 4.
Table 1 lists the differences in the number of layers and number of parameters between the three CNN architectures used for this research.These three networks will be trained and classified the original and pre-processed CFU images with three pre-determined network hyperparameters, which are the learning rate of 0.0001, batch size of 15 and training cycle of 10 epochs.

MODEL EVALUATION
The evaluation procedure involves determining the precision and accuracy of the three CNN models using ( 1) and ( 2) where TP is true positive, FP is a false positive, TN is a true negative, and FN is a false negative. (1) (2) shows the total number of images for each type of CFU used as training data, validation data and classification data.

PRE-PROCESSING
The images are cropped so that the colony can be isolated from its background.

TRAINING AND CLASSIFICATION USING CNN
To train a CNN, the input size must match the number of neurons in the first layer of the network.Thus, the input image is rescaled to for the AlexNet model and for GoogleNet and ResNet-18 models.Both original images and pre-processed images are used in this research to compare any improvement in terms of precision and accuracy performance, and training time.
The time taken to complete the training process by each CNN model is recorded.In addition, the training process is performed in 10 epochs, for both original and pre-processed images.When the original images are fed as the input, the time taken to complete the training process by AlexNet is about 165 minutes and 36 seconds to complete the process when the number of epochs reaches 10.Meanwhile, the training process for GoogleNet took about 178 minutes and 17 seconds.and ResNet-18 took about a bit shorter than GoogleNet with a 1-minute and 15 seconds difference.The training time is much longer for GoogleNet because the deeper network is used compared to AlexNet with the least layers network.A sample of training progress for ResNet-18 using the original image as depicted in Figure 6.
Next, the pre-processed images are fed into CNN models for training and classification.AlexNet took about 61 minutes and 37 seconds to complete the training process.Meanwhile, GoogleNet and ResNet-18 took about 96 minutes and 30 seconds; and 79 minutes and 40 seconds, respectively.These networks performed the feature training much longer compared to AlexNet because of the deeper network used, the same as the original image training process.Figure 7 shows the training progress details for the different maximum numbers of epochs using AlexNet.
The results show that the time taken to complete the training process for all CNN models increases linearly with the increment of epochs.The graphs in Figure 6 and Figure 7 display the training error that almost converged to zero, and the accuracy of classification improves as the number of epochs increases.This shows that the classification accuracies may be improved by increasing the number of epochs.However, with the limitation of computational hardware, the result achieved with a smaller number of epochs is still acceptable.
Table 3 lists the comparison of the training time taken for stem cell classification of AlexNet, GoogleNet and ResNet-18 using the original image and the pre-processed images with 10 epochs of the training process.The training time required by AlexNet is the lowest followed by ResNet-18 and GoogleNet.This is due to the architecture of the CNN used where AlexNet has the smallest number of layers compared to GoogleNet and ResNet-18.Subsequently, when the pre-processed images are fed as the input, the training time taken by all three models improved by reducing over half of the training time compared to the original images as the input to complete the phase.This is due to enhanced images that amplify the important features of the CFUs and thus contribute to the lesser time for the models to extract the features for classification.
In terms of performance comparison in Table 4, ResNet-18 shows the highest precision followed by AlexNet with 94.5% and 91.74%, respectively.In contrast, the performance for AlexNet shows increments in accuracy about 2.75% higher than ResNet-18 followed by GoogleNet.Although the performance for GoogleNet is slightly decreased with the longest time taken for training, the accuracy result for both original and pre-processing images is still comparable.In addition, the precision performance of the applied pre-processing steps before the classification also significantly improved.
Furthermore, the shortest processing time is taken, and the highest classification accuracy was performed by AlexNet with 61 minutes and 37 seconds and 96.33%, respectively.AlexNet has successfully outperformed other networks in terms of training time and accuracy, with optimum precision performance.This is due to the least and compact network layer to process the small-scaled dataset which fit with less network complexity compared to the utilization of dataset on the dense network layer such as GoogleNet and ResNet-18.
Based on the quantitative point of view, the remaining performance would be due to misclassification during the classification.For example, the CFU-GM might be misclassified as CFU-GEMM because of its morphological features of containing granulocyte cells.Apart from that, CFU-E also might be misclassified as CFU-GEMM which gives rise to erythrocyte cells.Regardless of the qualitative features combination, the CNN models have successfully classified the three CFU classes.

CONCLUSION
In this research, a method to classify CFU images is proposed.The deep neural network has successfully been used to classify hematopoietic colony-forming cells based on the different types of CFU.The implementation of a deep neural network has shown the high performance of precision and accuracy in classification.With the help of a deep neural network, the efficiency of classification tasks in the medical field can be enhanced and further facilitates medical experts' tasks in identifying the potential value of the rapidly formed CFU.The precision, accuracy and computational time of the deep neural network can be further improved with the help of pre-processing process and additional data as performed by AlexNet.Thus, image pre-processing approach for this research is crucial to improve input quality before the images are fed into DNN models for a better classification performance.
In future work, more types of CFU images can be added to be classified.The number of images can be increased as well, especially for CFU-E and CFU-Pre-B.The addition of these images can help the deep neural network to learn more features of each CFU type and thus improve the performance of CFU image classification.Furthermore, the optimum selection of hyper-parameter for the pre-trained CNN network can also be examined to .improveCFUs classification performance

FIGURE 2 .
FIGURE 2. A sample of original image for each type of CFU used in the research

FIGURE 3 .FIGURE
FIGURE 3. AlexNet architecture(Krizhevsky et al. 2012) FIGURE 5. A sample of cropped a) original CFU image and b) enhanced CFU image

FIGURE 6 .
FIGURE 6. Training progress for ResNet-18 with 10 epochs using original image

TABLE 1 .
Comparison of different CNN architectures

TABLE 2 .
Number of images for training, validation and classification

TABLE 3 .
Training time taken comparison between original and pre-processing CFU images for 10 epochs