ReRNet: A Deep Learning Network for Classifying Blood Cells

Aims Blood cell classification helps detect various diseases. However, the current classification model of blood cells cannot always get great results. A network that automatically classifies blood cells can provide doctors with data as one of the criteria for diagnosing patients’ disease types and severity. If doctors diagnose blood cells, doctors could spend lots of time on the diagnosis. The diagnosis progress is very tedious. Doctors can make some mistakes when they feel tired. On the other hand, different doctors may have different points on the same patient. Methods We propose a ResNet50-based ensemble of randomized neural networks (ReRNet) for blood cell classification. ResNet50 is used as the backbone model for feature extraction. The extracted features are fed to 3 randomized neural networks (RNNs): Schmidt neural network, extreme learning machine, and dRVFL. The outputs of the ReRNet are the ensemble of these 3 RNNs based on the majority voting mechanism. The 5 × 5-fold cross-validation is applied to validate the proposed network. Results The average-accuracy, average-sensitivity, average-precision, and average-F1-score are 99.97%, 99.96%, 99.98%, and 99.97%, respectively. Conclusions The ReRNet is compared with 4 state-of-the-art methods and achieves the best classification performance. The ReRNet is an effective method for blood cell classification based on these results.


Introduction
The cells that exist in the blood are called blood cells. It can flow through the whole body through the blood. For mammals, blood cells are roughly divided into platelets, red blood cells, and white blood cells. Transporting oxygen is the main function of red blood cells in the body. White blood cells are mainly responsible for protecting the body. When foreign germs enter the body, it is responsible for destroying these germs. Platelets play a vital part in the hemostasis of the body.
Blood cell classification helps detect various diseases. 1 The current classification of blood cells is completed by doctors. If doctors manually diagnose blood cells, doctors could spend lots of time on the diagnosis. The diagnosis progress is very tedious. Doctors can make some mistakes when doctors are influenced by some factors, such as feeling tired. On the other hand, different doctors may have different points on the same patient.
A network that automatically classifies blood cells can provide doctors with data as one of the criteria for diagnosing patients' disease types and severity. More and more researchers proposed a sea of computer models for the classification of blood cells. 2 Huang et al. 3 proposed a new network (MGCNN) to classify the blood cell. This network was composed of the convolutional neural network (CNN) and the modulated Gabor wavelet. They calculated the dot product of CNN kernels with multi-scale and orientation Gabor operators. After that, they combined these features to classify the blood cells. Parab and Mehendale 4 combined image processing technology and CNN for classifying red blood cells. They classified red blood cells into 9 categories. Finally, this method achieved 98.5% accuracy. Lamberti 5 introduced the SVM model for the classification of red blood cells. This model yielded around 99% accuracy. Liang et al. 6 presented a combined network (CNN-RNN) to classify blood cells. The custom loss function was selected to speed up the network convergence. When the Xception was selected as the backbone, the network got the best 4-classification accuracy of 90.79%. Banik et al. 7 designed a fused CNN framework for white blood cell classification. For this framework, there were 5 convolution layers, 3 pooling layers, and one fully connected layer. The experiment showed that this framework could achieve good performance and train faster than the CNN-RNN model. Yang et al. 8 used Faster R-CNN as the main method to detect cells based on the microscopic images. Lots of experiments' results showed that this method could save time and get good performance. Imran Razzak and Naz 9 presented 2 methods for blood smear segmentation and classification. One method was based on the fully conventional network. Another method combined the CNN and extreme machine learning for classifying the blood smear. The segmented method got 98.12% and 98.16% accuracy. The classification accuracy was 94.71% and 98.68%. Choi et al. 10 applied a dual-stage CNN method for the automatic white blood cell count. The public data set included 2174 images and was divided into 10 classes. The final result achieved 97.13% precision, 97.1% F1, 97.06% recall, and 97.06% accuracy. Sahlol et al. 11 proposed a new method for classifying white blood cells. The VGG was selected as the backbone of this method and used as the feature extractor. Then, these features were sent to the enhanced Salp Swarm Algorithm. Shahzad et al. 12 introduced a novel framework (4B-AdditionNet-based CNN) to categorize white blood cells. The contrast-limited adaptive histogram equalization was used for processing. Some CNN networks, ResNet50 and EfficientNetB0, were used as feature extractors. The SVM and quadratic discriminant analysis were used as the classifier. This method got 98.44% accuracy on the blood cell images data set. Loh et al. 13 18 offered a novel system to segment white blood cells based on smear images. This novel system was based on CNN models. Three features were selected and then fed to CNN models for segmentation, which were color features, LDP-based texture features, and geometrical features, respectively. There were 450 images for experiments. This system achieved 99.11% accuracy. Harahap et al. 19 used different CNN models to classify red blood cells. Two different CNN models were selected, which were LeNet-5 and DRNet. By experiments, the LeNet-5 and DRNet could achieve 95% and 97.3% accuracy, respectively. Kousalya et al. 20 used different CNN models to classify the blood cells. The GoogleNet using LReLU and ReLU can achieve 91.72% and 93.43% accuracy. Chien et al. 21 combined CNN and Faster R-CNN for the identification and detection of white blood cells. The combined network could get an accuracy of 90%. Lee et al. 22 offered a new architecture for detecting or counting blood cells. In this architecture, VGG16 was selected as the backbone. The convolutional block attention module, region proposal network, and region of interest pooling were used to improve the classification performance. This architecture achieved 76.1% precision and 95% recall. For the classification and detection of one type of blood cell, Abas and Abdulazeez 23 proposed 2 different systems, which were named CADM1 and CADM2. These 2 different systems were based on CNN and YOLOv2. By experiments, the CADM2 can achieve 92.4% accuracy and 94% precision. Kumari et al. 24 designed a novel CNN model to classify white blood cells, which included 2 convolutional layers, 2 max-pooling layers, 2 ReLU layers, 2 fully-connect layers, and softmax. The accuracy of this CNN model was greater than 90%. Girdhar et al. 25 introduced a new network based on CNN to classify white blood cells. This network was tested on the Kaggle dataset and achieved 98.55% accuracy. Ekiz et al. 26 used different methods to classify white blood cells. The first method was based on CNN. The other method combined the CNN and SVM model. By comparison, the combination of SVM model and CNN can yield better accuracy, which was 85.96%. Ramadevi et al. 27 used a CNN model (ConVNet) for the blood cells classification. This framework was tested on 13k images and achieved above 80% accuracy. Ghosh and Kundu 28 proposed a network which combined the CNN and randomized neural network (RNN). The long shortterm memory network was selected to improve the accuracy. The CNN-RNN model achieved 94.13% accuracy for 2-way classification and 87.29% accuracy for 4-way classification. Xiao et al. 29 introduced a new system for the white blood cell classification. In this system, the EfficientNet was used as the main backbone. Based on this system, it can yield 90% accuracy with high speed.
From the above analysis, many recent methods for classifying blood cells are based on CNN. However, the CNN models have a sea of parameters and layers. It will take a lot of time for training. What's more, the small data set may not have a great classification performance based on the CNN models.
To settle these problems, we offer a novel method (ReRNet) for blood cell classification. In the ReRNet, ResNet50 is selected as the backbone, and 3 RNNs are used for classification. They are SNN, dRVFL, and ELM, respectively. The main contributions of this paper are listed as follows: • We propose a novel method (ReRNet) for blood cell classification. • The ResNet50 is validated as the backbone by comparing it with other CNN models. • The proposed method is compared with other state-ofthe-art methods and achieves the best performance. • Three RNNs are used to improve the classification performance. • The outputs are the ensemble of the output of 3 RNNs to improve the robustness.
The rest of this paper is as follows: section "Materials" talks about the materials; section "Methodology" is mainly about the methodology used in this paper; the experiments are shown in section "Experiment Results"; the conclusion is presented in section "Conclusion."

Materials
The public data set can be available on the Kaggle website (https://www.kaggle.com/datasets/paultimothymooney/bloodcells). This public data set has 4 types of blood cells: neutrophils, monocytes, lymphocytes, and eosinophils. This public data set included 12,500 blood cell images in total. However, some diseases and their complications have no significant effect on neutrophil dynamics. 30 Therefore, 3 cell types are tested, which are eosinophil, lymphocyte, and monocyte. The number of images of eosinophil, lymphocyte, and monocyte are 3120, 3103, and 3098, respectively. 31 The details of the data set used in this paper are given in Table 1. Eosinophils account for 0.5% to 3% of the total number of leukocytes. Lymphocytes account for 20% to 30% of the total number of leukocytes, which are round or oval in size. Monocytes account for 3% to 8% of the total number of white blood cells. It is the largest cell in white blood cells. Diameter 14 to 20 μm. The figures of eosinophils, lymphocytes, and monocytes are given in Figure 1.

Methodology
Computer-aided diagnosis system has been widely used in the medical field based on artificial intelligence and computer vision technology such as medical image detection, segmentation, classification, and so on. The important step in medical image analysis is to extract useful features. With the continuous progress of deep learning technology, the CNN model has become one of the main feature extraction methods in computer-aided diagnosis systems. The function of the convolution layer is to extract features. The pooling layer can reduce the dimension of the feature map to save the calculation amount and time. In the most recent decade, many excellent CNN models have been proposed, such as ResNet, 32 AlexNet, 33 VGG, 34 MobileNetv2, 35 DenseNet, 36 UNet, 37 and so on. The hyperparameters used in this paper are concluded in Table 2.

Selection and Modification of Backbone Network
Generally speaking, the more layers of the network, the richer the features of different levels could be extracted. But some experiments show that with the deepening of the network, the optimization effect is worse, and the accuracy is reduced. For example, one is a 56-layers network, and the other is a 20-layers network. Generally, the performance of the 56-layers network should be greater than or equal to the performance of the 20-layers network. However, the error of the 56-layers network may be greater than that of the 20-layers  network. The 20-layers network could achieve better classification performance than the 56-layers network. He et al. 32 proposed ResNet to solve the above-mentioned problems. For a stacked-layer structure (several layers stacked), the input is represented as X, the learned feature is denoted as D(X ), and E(X ) is got through the residual learning. The formula of the residual learning is shown as: Based on the above formula, the learned feature can be shown as follows: When the residual is 0, the accumulation layer only performs identity mapping, and the network performance will not be degraded. Therefore, this residual greatly alleviates the degradation problem and enables the accumulation layer to learn new features based on the input features to achieve better performance. The framework of ResNet is given in Figure 2.
In this paper, transfer learning is used to improve classification performance. Some modifications should be made to the ResNet because of the difference between the ImageNet dataset and the dataset used in this paper, as is demonstrated in Figure 3. The original "Softmax activation" and "Classification" are replaced by the "FC128," "ReLU," "BN," "FC3," "Softmax activation," and "Classification," respectively. The "FC3" is added as there are only 3 categories of images in our dataset. Further, an "FC128" is inserted into the model to mitigate the difference in dimensions between "FC1000" and "FC3." The modified ResNet is fine-tuned on our dataset, and 3 RNNs replace the last 5 layers to achieve better classification performance. Therefore, the ResNet only serves as the feature extractor in the proposed model, and the "FC128" is the feature layer.

Proposed Strategy of Ensemble of RNNs
CNN models have achieved amazing results in recent years. For example, they can obtain high accuracy in ImageNet. However, the performance of the CNN model is often unsatisfactory on small data sets. Therefore, we select 3 RNNs in this paper, which are Schmidt neural network (SNN), ELM, and deep random vector function linking (dRVFL). The framework of SNN is shown in Figure 4.
The calculation steps of SNN are given as follows: where the dataset is set as (x i , h i ), x i is the input, h i is the ground-truth label, n and m are the dimensions of the input and output. The second calculation step is presented as follows: where g() is the activation function, b j is the weight which connects the input data with the j-th hidden node, c j is the bias of  Hyperparameter Definition The given data set n The input dimension m The output dimension b j The weights vector c j The bias of the j-th hidden node d The final output weight e The output bias of SNN h = (h 1 , . . . , h N ) T The ground-truth label matrix of the data set l The number of hidden layers in dRVFL g() The sigmoid function v The number of hidden nodes M The output matrix of the hidden layer T The input of the output layer for dRVFL p The predictions of the three RNNs the j-th hidden node, and v represents the number of hidden nodes. After 2 calculation steps, the output weight (d) is calculated as follows: where e is the bias between the hidden layer and the output layer, the ground-truth label is presented as h. Another RNN used in this paper is ELM, as given in Figure 5.
From the figures of these 2 RNNs, we can see that the only difference between SNN and ELM is that there is an output bias on the SNN. Therefore, for the calculation of ELM, the first 2 steps are the same as SNN. The calculation of output weight (d) for ELM is given as follows: As presented in Figure 6, the number of hidden layers is l. For SNN and ELM, there is only one hidden layer in their structures. The calculation steps of dRVFL will be a little different. For the first hidden layer, the calculation is defined as follows: For other hidden layers, the calculation is presented as follows: The input of the output layer is shown as follows: The final output weight of dRVFL is given as: The performance of the ensemble of neural networks is usually better than that of the individual network because the ensemble of neural networks is more robust. The parameters in the RNN are random and remain unchanged in the training process. Therefore, bad parameters will lead to poor results. Based on this situation, 3 RNNs are ensembled in this paper, and the error rate can be reduced by majority voting. The formula of the majority voting is given as: where given the image a im , the output function is presented as U (a im ), the predictions of the 3 RNNs in this paper are p α , p β , and p γ , respectively, and [1 0 0] T means the Eosinophil. The output of the proposed model is obtained using the majority voting-based ensemble of the outputs from the 3 RNNs. This paper uses 3 RNNs (SNN, ELM, and dRVFL) for classification, so there will be 3 results. For the majority voting mechanism, when the majority results (more than half of the results) are consistent, the results will be output. When the 3 results are inconsistent, the classification result would be the eosinophil.

Proposed ReRNet
We propose a "ResNet50-based ensemble of RNNs (ReRNet)" for blood cell classification. ResNet50 is used as the backbone model for feature extraction. The extracted features are fed to 3 RNNs: SNN, ELM, and dRVFL. The results of the ReRNet are the ensemble of these 3 RNNs based on the majority voting mechanism. The 5 × 5-fold cross-validation is applied to validate the proposed network. The structure of ReRNet is given in Figure 7. The pseudocode of the proposed ReRNet is given in Table 3. Initially, a ResNet50 pre-trained on the ImageNet dataset is selected as the backbone model of the ReRNet. Then, the pre-trained ResNet50 is modified and fine-tuned on    the data set used in this paper, and the end 5 layers are replaced by 3 RNNs: SNN, dRVFL, and ELM. So, the ResNet50 can be regarded as the feature extractor in our model. Then, the 3 RNNs are trained with features from the backbone model. Finally, the output of the proposed model is obtained using the majority voting-based ensemble of the outputs from the 3 RNNs.

Evaluation
Four multi-classification indexes are used to evaluate the proposed method in this paper: average-accuracy, average-sensitivity, average-precision, and average-F1-score. Because there are 3 classes in this paper, we first define the formulas for accuracy, sensitivity, precision, and F1-score per class as follows: where ∂ is the number of classes. Three classes are classified in this paper which is a little different from the binary classification. In this situation, 3-class classification can be simplified as 3 binary classifications. When one class is defined as positive, the other 2 classes are defined as negative. For example, when ∂ is 2, the definitions of true positive TP(2), true negative TN(2), false negative FN(2), and false positive FP(2) are presented in Figure 8. The calculations of these 4 multi-classification indexes (average-accuracy, average-sensitivity, average-precision, and average-F1-score) are given as follows: average-accuracy = A confidence interval is an estimated interval for an index. In statistics, the confidence interval of a probability sample is an interval estimate of a certain index. The confidence interval shows the degree to which the true value of the index has a certain probability of falling around the measurement result, which gives the degree of credibility of the measured value of the index. Besides, pairwise comparisons are used to determine which of the 2 models is better for each possible index.

The Settings
The hyper-parameter settings in our ReRNet are demonstrated in Table 4. The mini-batch size is only 10. The max-epoch is set as 2 to avoid overfitting. The learning rate is 1×10 −4 , which is a conventional setting. The only pre-defined hyperparameter in the 3 RNNs is the number of hidden nodes (v), which is set as 400. The random mapping from lower Table 3. The Pseudocode of the Proposed ReRNet.
Step 1: Import the dataset and divide the dataset into training and testing sets.
Step 3: Preprocessing Resize samples in the training and testing set based on the input size of ResNet50.
Step 4: Modification of backbone network.
Step 4.1 Remove softmax and classification layer.
Step 5: Replace the last five layers with three RNNs.
Step 6: Extract features as the output of the FC128 layer.
Step 7: Train the three RNNs on the extracted features and the labels.
Step 7.1: Input is the extracted features.
Step 7.2: Target is the images' labels of the processed training set.
Step 8: Add the majority voting layer.
Step 8.1: Ensemble the predictions of the three RNNs Step 8.2: Majority voting of the ensemble of the predictions from the three RNNs. Step 8.3: The whole network is named ReRNet.
Step 9: Test the trained ReRNet on the processed testing set.
Step 10: Report the classification performance of the trained ReRNet. dimension to higher dimension space is beneficial for the classification.
Lots of scholars usually would use more max epochs to improve classification performance, such as 30 and 50 max epochs. More max epochs would lead to more training time. Our proposed method gains these results in this paper by only 2 max-epochs for our training, which is much more time-efficient. RNNs (with v hidden nodes) with randomly chosen input weights and hidden layer biases can exactly learn v distinct observations. Unlike the popular thinking and most practical implementations in that all the parameters of the feedforward networks need to be tuned, one may not necessarily adjust the input weights and first hidden layer biases in applications. Therefore, the RNNs could reduce training time.

The Performance of ReRNet
The proposed method (ReRNet) is evaluated by 5 × 5-fold cross-validation. We carry out the 5-fold cross-validation by 5 runs to avoid the contingency. The classification performance of these 5 runs is presented in Table 5. The average-accuracy, average-sensitivity, average-precision, and average-F1-score per per-class of each run are given in Table 6. All the

Effects of Different Backbones
In this paper, we test 5 different pre-trained CNN models as the backbones: AlexNet, MobileNet, ResNet18, ResNet50, and VGG, respectively. The results of these 5 pre-trained CNN models are shown in Table 7. From the comparison of results, we can find that the ReRNet achieves the best results. We can conclude that the proposed method with ResNet50 as the backbone is an effective tool for blood cell classification.

Effects of Ensemble of RNNs
We compare the proposed method (ReRNet) with 3 individual RNNs to verify the superiority of the RNNs ensemble. The statistics of 3 RNNs in 5 experiments for blood cells are presented in Table 8. The results of the proposed method are better than the other 3 methods. RNN is recognized as an unstable network because of the randomly fixed parameters during training. The ensemble of 3 individual RNNs could improve the classification performance. In conclusion, the ensemble of RNNs can improve classification performance.
We also conduct the pairwise comparison based on the results of Table 8. The pairwise comparison of 4 indexes is presented in Figure 9. The proposed model's average-accuracy, average-sensitivity, and average-F1-score are better than ResNet50-SNN and ResNet50-ELM and at the same level as ResNet50-dRVFL. The average-precision of ResNet50-ELM and ResNet50-dRVFL is close to the proposed model. In  conclusion, ReRNet achieves good performance, even though the difference between ResNet50-dRVFL and ReRNet is marginal.

Comparison With the Backbone
The proposed method (ReRNet) is compared with transferred ResNet50 to verify the superiority of the proposed method.
The results of the comparison are given in Table 9. The result of our ReRNet is obtained by averaging 5 runs. Based on Table 9, we can conclude that all the results gained from the proposed method are more excellent than those from transferred ResNet50.

Explainability of the Proposed ReRNet
CNN is just like the black box in applications. Therefore, it is very significant to explain CNN. In this paper, we select gradient-weighted class activation mapping (Grad-CAM) to explain the proposed method (ReRNet) and to figure out how the proposed method makes predictions. We can visualize the attention of ReRNet based on the Grad-CAM. The figures of Grad-CAM are presented in Figure 10. The red regions are the attention of the proposed method. The proposed method would pay less attention to the blue regions. It can be concluded that the ReRNet has the ability to capture blood cells.

Comparison With Other State-of-the-art Methods
This paper compares our proposed method (ReRNet) with other state-of-the-art methods for blood cell classification. Four state-of-the-art methods are selected to compare with the proposed method, which is the SVM polynomial model, 5 CNN-RNN, 6 Fused CNN, 7 and 4B-AdditionNet, 12 respectively. The results are presented in Table 10. We can see that our ReRNet achieved the best results, proving our method effectively classifies blood cells.

The Generality of the Proposed Model
To prevent the significant overfitting problem and verify the generality of the proposed model, we test our model on   another blood cell data set named malaria cell dataset, which is public and available on the Kaggle website (https://www. kaggle.com/datasets/iarunava/cell-images-for-detectingmalaria). We compare our model with other state-of-the-art methods on this public data set. The comparison results are presented in Table 11. The comparison results show that our models can achieve the best results with other state-of-the-art methods.
In conclusion, our model can yield good results in other public data sets.

Conclusion
In this paper, we propose a novel method (ReRNet) for classifying blood cells. The ResNet50 is selected as the backbone of this novel method. Three RNNs (SNN, dRVFL, and ELM) are used for classification. The results of the proposed method are generated by the ensemble of results of 3 RNNs by majority voting. The average-accuracy, average-sensitivity, averageprecision, and average-F1-score are 99.97%, 99.96%, 99.98%, and 99.97%. It proves that the proposed method is effective for blood cell classification.
In future research, we will apply this method to other data sets. What's more, it is very significant for segmenting blood cells. Therefore, we will pay more attention to the segmentation of blood cells. More methods will be tested for blood cell classification, such as UNet, transformer, etc.

Data Availability Statement
The datasets are public and can be downloaded here: https://www. kaggle.com/datasets/paultimothymooney/blood-cells (accessed on 30 June 2022).