A Novel Data Augmentation Convolutional Neural Network for Detecting Malaria Parasite in Blood Smear Images

ABSTRACT Malaria fever is a potentially fatal disease caused by the Plasmodium parasite. Identifying Plasmodium parasites in blood smear images can help diagnose malaria fever rapidly and precisely. According to the World Health Organization (WHO), there were 241 million malaria cases and 627 000 deaths worldwide in 2020, while 95% of malaria cases and 96% of malaria deaths occurred in Africa. Also in Africa, children that are less than five years old accounted for an estimated 80% of all malaria deaths. To address the menace of malaria, this paper proposes a novel deep learning model, called a data augmentation convolutional neural network (DACNN), trained by reinforcement learning to tackle this problem. The performance of the proposed DACNN model is compared with CNN and directed acyclic graph convolutional neural network (DAGCNN) models. Results show that DACNN outperforms previous studies in processing and classification images. It achieved 94.79% classification accuracy in malaria blood sample images of balanced class dataset obtained from the Kaggle dataset. The proposed model can serve as an effective tool for the detection of malaria parasites in blood smear images.


Introduction
Malaria is an endemic infection that has affected over 228 million people worldwide. In 2018, 405,000 people died in 106 countries and territories (World Health Organization 2020). Malaria in Nigeria is a major health problem, accounting for more cases and deaths than any other nation in the world. Malaria posed a threat to about 97% of the population of Nigeria because of location. Only about 3% of the Nigerian population is accounted as malaria-free zone and this small percentage protects the remaining 3% of the population (Okeke 2012). There are more than 100 million malaria cases in Nigeria alone with more than 300,000 deaths per year, which is even higher input data set are used to build the training model. In the successive stage, the test data set is arbitrarily given to the classifier and then compared with the trained model which generates the result in the form of image blood samples classification result.
Conventional methods of malaria parasite detection and classification are characterized by low precision and time wastage (Tangpukdee et al. 2009). Therefore, deep learning models such as AlexNet (Krizhevsky, Sutskever, and Hinton 2012), GoogLeNet (Szegedy et al. 2015), VGGNet (Simonyan and Zisserman 2014), ResNet (He et al. 2016) and DenseNet (Huang et al. 2018) were proposed to solve these problems. These techniques are very efficient in detecting malaria parasites in blood samples. Deep learning algorithms are often used in clinical studies because they are fast and effective computational methods (Litjens et al. 2017;Miotto et al. 2018). They also reduce the costs of health care in activities to prevent infection compared to experimental methods. Some of the areas of applications of the computational method and deep learning models in relation to health include diagnostic plans, the development of treatment protocols, the development of drugs, the follow-up of patients and care (Ahuja 2019).
Recently, reinforcement learning has been adopted to improve the training process of deep neural networks (Alom et al. 2019;Hernandez-Leal, Kartal, and Taylor 2019;Wang et al. 2020). It has also been used for the classification of biomedical images (Mahmud et al. 2018). For example, reinforcement learning was used in segmenting transrectal ultrasound images to assess the location and volume of the prostate (Sahba, Tizhoosh, and Salama 2008). Zhang et al. (Zhang et al. 2019) proposed a reinforcement sampling strategy to address the problem of unbalanced data in the breast tumor image dataset. Tian et al. (2020) describe the image segmentation process as a Markov decision process. Then an agent is trained using a deep reinforcement learning (DRL) algorithm to perform segmentation of regions of interest in medical images.
The main novelty and contribution of this work is the fusion of directed acyclic graph and data enhancement with a convolutional neural network (CNN) to enhance the performance of malaria parasite detection in blood smear images. Reinforcement learning is adopted to obtain superior malaria parasite blood sample detection and classification results.
The contributions of the paper are outlined below: (i) The method of detecting and diagnosing malaria using time saving and effective deep learning technique was developed in this paper, (ii) A novel deep learning model, known as data augmentation convolutional neural network (DACNN) which is a the combination of directed acyclic graph and data enhancement with a convolutional neural network (CNN) with enhanced accuracy was proposed, (iii) Reinforcement learning was applied to train DACNN to obtain better detection and classification accuracy of about 94.79%, (iv) The application of DACNN rather than using traditional classifier such as Random Forest to take advantage of the ability of the classifier to operate directly on blood smear images was accomplished, The rest of this paper is organized as follows: Section 2 discusses related works in the field of cassava mosaic disease detection. The proposed methodology used in this work is explained in Section 3. The results and the discussion of the results are presented in Section 4 and Section 5 is the conclusion of the paper.

Related Works
In this section, we briefly discuss recent researches conducted in the field of malaria disease detection. Machine learning and deep learning algorithms have gained wide acceptance among researchers and academicians for detecting malaria in blood smear images. This can be attributed to the efficacy of these algorithms in solving the problems associated with the detection and classification of malaria. During the past decade, a wide range of deep learning models has been used in clinical study and healthcare. In one of the studies, the Deep Belief Network (DBN) was proposed in Bibin, Nair, and Punitha (2017) to detect malaria parasites on blood images. The proposed method contains stack Boltzmann machines that use the contrastive divergence strategy to classify blood image samples as either parasite or nonparasite. The performance of the proposed system is satisfactory. However, there is still room for further improvement as the image dataset used for the experiment is small, and therefore it cannot be ascertained if their technique can effectively handle a large dataset. Rajaraman, Jaeger, and Antani (2019) investigated the performance of CNNs in detecting malaria parasites in blood samples. The main intention of the authors is to design an ensemble CNN model that have a superior performance compared to other state-of-the-art models in terms of robustness and accuracy. The proposed system classifies blood samples as parasitized or normal. The experimental results indicated that the ensemble of VGG-19 and SqueezeNet outperformed the other ensemble models used for the study. Qanbar et al. (2019) applied a Residual Attention Network (RAN) to assist in the analysis and decision making system for classifying blood samples as infected or non-infected. The result showed that the RAN model achieves a good prediction performance in the processing and classification of image blood samples compared to other types of algorithms. RAN achieved a 95.79% accuracy rate compared to an 83.30% accuracy rate obtained using the support vector machine (SVM). Chaya and Usha (2019) proposed three techniques which are: the Cuckoo Search-Based Ensemble Classifier (CSEC), Scale to Estimate Premature Malaria Parasites Scope (SEMP), and Hybrid Classification of Malaria Blood Smear Images. The experimental result indicated that CSEC performs better than the hybrid classifier method in terms of accuracy. The strength of their approach is that it employed a metaheuristic optimization algorithm instead of machine learning. The shortcoming of the method is that metaheuristics do not guarantee that a globally optimal solution can be found in some classes of problems (Torres-Jimenez and Pavon 2014). Kumari, Singh, and Kumar (2019) applied feature selection technique with Logistic Regression, Naive Bayes, KNN, Decision Trees, Random Forest Classifier, Support Vector Machine (SVM) and Artificial Neural Network (ANN) machine learning models to predict liver disease from UCI dataset. Simulation results show that rightly choosing feature extraction method for each model is very important in getting good results. The combination of feature selection and machine learning models produce enhanced accuracy of up to 92%. This is an improvement compared to when machine learning classifiers alone were used. The downside of their work is that accuracy is the only performance metric used for evaluating the effectiveness of the proposed model. Moreover, the only liver disease dataset used in the work was the one obtained from UCI. Furthermore, the size of the dataset is small, and therefore is not sufficient to prove the efficacy of the technique presented by the authors. Negi, Kumar, and Chauhan (2021) proposed deep CNN model to identification and recognition of plant diseases. Experimental results indicated that the proposed technique attained an accuracy of 96.02% which is very good. However, the authors did not use many of the state-of-the-art performance measures to evaluate the performance of the proposed system. Oyewola et al. (2021) presented a unique deep residual convolution neural network (DRNN) for detection of Cassava Mosaic Disease in cassava leaf images. The proposed method can counterbalance the imbalanced image dataset of cassava diseases, and enhance the number of images accessible for training and testing by using different block processing. Furthermore, Gamma correction and decorrelation stretching was used to improve color separation in images with high band-toband correlation. The results of the simulations show that employing a balanced dataset of images improves classification accuracy. The proposed DRNN model outperforms the simple convolutional neural network (PCNN) by producing balanced accuracy of 94-99% with a considerable margin of 9.25% on Kaggle cassava disease dataset which comprises of 5,656 images. One limitations of this work is that every deep learning based techniques are inclined to overfit the training dataset, which hinders them from generalizing. Moreover, image enhancement using gamma correction is not likely the most ideal technique in case of hostile photographing situations. Alok, Krishan, and Chauhan (2021) proposed deep learning technique for detecting malaria. The authors used a malaria dataset that contains 27587 images which was divided into training set (23448 images) and validation set (4139 images) for their experiments. The proposed method achieved 95.70% accuracy in detecting and classifying malaria cells. Also, the method proposed in the paper attained precision, recall, and f1-score of 0.96. Negi and Kumar (2021) proposed deep learning method for the detection and classification of citrus diseases to assist crop productivity. The dataset includes 759 images of both Citrus fruits and collectively leaves safe and unhealthy images. The proposed method was able to recognize and classify the diseases satisfactorily especially in the first stage. A precision of 97.65%, Recall of 91.21%, and f1-Score of 94.32% was attained for first phase. For the second phase of the proposed method, a training accuracy of 65.94% and validation accuracy of 62.50% was recorded. One of the limitations of the work is that the combined average classification accuracy of the two stages is still relatively low. Moreover, the dataset used for the work is not balanced. Also, the dataset used for the experiments is small in size. Table 1 presents a summary of all related works considered in this paper.

Convolutional Neural Networks (CNN)
Presented in Figure 1 is the random display of training images for both Uninfected and Infected Malaria blood samples. The CNN model mainly consists of three types of layers: convolution, pooling, and fully connected layers. The primary layers are, convolution and pooling layers which extract features while the third, a fully connected layer, map the extracted features into the final (classification or regression) output (as shown in Figure 2). The convolutional layer serves as the feature extractor and learns the features properties of the input images. The neurons are grouped into feature (characteristics) maps in the convolutional layers. The neuron is organized into a feature map in the convolutional layers. A neuron within a feature map has a receptive field linked to a neuron region of the preceding layer, often referred to as a filter bank, and has a series of trainable weights (LeCun, Bengio, and Hinton 2015). The feature map Y k can be computed as follows: where the input image is denoted by x, the convolutional filter related to the kth feature map is denoted by W k , b is the bias and γ represents the nonlinear activation function. There is no assurance that the metaheuristics used will guarantee a globally optimal solution for some classes of problems.
Mehedi Masud et al. 2020 CNN CNN has high accuracy in the prediction of malaria blood smear images. Performance of the proposed model is relatively low. Also, the size of the dataset is small. Moreover, the authors did not benchmark their work with other works.
A batch normalization layer as the name implies normalizes, scales, and moves mini-batch data to boost network stability from the previous layer (Bjorck et al. 2018). The batch normalization layer mathematical equation is given as: where μ is the mean, σ 2 is the variance, b x i is the normalized data, α; β are the parameters, γ β to be learned for proper scaling and shifting the normalized data.
A rectified linear unit (ReLU) (Nair and Hinton 2010) represents actual neurons more complexly. It is produced through the creation of numerous sigmoid copies. This can be done under the premise that all the repeated items learn the same weights and biases. The ReLU is given as follows: Where Y k is given in Equation (1). Pooling layers is another elementary unit of CNN. This layer increasingly decrease the spatial dimension of the model to lower the number of parameters and processing in the network. Pooling layers help to reduce complexity further and increase network strength. The most common form of pooling layers is Max pooling and average pooling. Note that in any of the pooling layers there is no learnable parameter, whereas filter size, stride, and padding are similar to hyper-parameters in pooling operations (Suárez-Paniagua and Segura-Bedmar 2018). Max pooling layer extracts patches outputs and feature maps of the input features, the maximum value for each patch, and discard all other values. To prevent overfitting, the dropout layers are used (Srivastava et al. 2014).
The final output layer for CNN consists of one or more fully connected layers also called dense layers, in which each input matches each learning outcome (Teuwen and Moriakov 2020). After extraction of features from the convolution layers and downsampling by the pooling layers, the outcome is to assign the features in a subset of the fully connected layers to each group in the classification tasks. In general, the final fully connected layer usually contains the same number of output nodes as the class number. The output layer is given as an unconstrained problem: Where fŴ À � is the output layer and ρ is the probability of the observed data.

Directed Acyclic Graph Convolutional Neural Network (DAGCNN)
A directed acyclic graph (DAG) is a graph containing cycles, which is a similar node that has coordinated paths from beginning to end (Li, Li, and He 2019). It is a non-cycle diagram. We utilize DAGCNN in this paper by incorporating it with the CNN features due to impressive performance on both image classification and object detection (He et al. 2016). The coordinated paths that follow each convolution layer for classification are shown in Figure 3, starting with the image input layer and ending with the classification layer. Figure 3 shows the DAGCNN structure with all the layers.

Data Augmentation of Convolutional Neural Network (DACNN)
This paper used data augmentation technique on the malaria blood sample data from infected and uninfected malaria blood samples. To the best of our knowledge this method has not been used before now for detection of malaria in blood samples. In both the training and test set of the malaria blood sample images, data augmentation was used on the dataset. The data was augmented from basic image transformation, such as rotation, translation, horizontal and vertical scale, random shear, and random reflection. The augmentations used in this study are random rotation, random translation, and horizontal and vertical scale. The images are rotated along the axis between −20° and 20° to the right or left. Rotation stability is calculated by the rotation degree parameter. Rotations between 20° and −20° help to improve the accuracy of the malaria blood sample of both infected and uninfected blood sample images utilized in this paper (Shorten and Khoshgoftaar 2019). Shifting images to the left, right, and down is useful to prevent positional bias in the malaria blood sample images. In this study, the images are translated between +3 and −3 pixels. The translated images of malaria blood samples are filled with a constant value which enables it to preserve the spatial dimension of images . Horizontal and vertical scaling to the malaria blood sample dataset was also applied here. Each image is scaled randomly between 1 and 1.

Reinforcement Learning
The general scheme of reinforcement learning adopted in this paper is shown in Figure 4. An agent interacting with the external environment in discrete time t ¼ 1; 2; . . . ; T was considered. The agent in state S t ð Þ performs the action a t ð Þ, receives reinforcement r t ð Þ and goes into the state S t þ 1 ð Þ. The agent's goal is to maximize the total reward U t ð Þ that can be received in the future. The U t ð Þ value is estimated by considering the forgetting coefficient: Model Critic Critic ΔX pr (t+1) where U t ð Þ is the estimate of the total reward expected after time t; γ is the forgetting coefficient (0 < γ < 1), with the help of which it is considered that the further the agent "looks" into the future, the less confidence he has in assessing the reward.
An agent, which aims to maximize its performance function C t ð Þ was considered. The agent seeks to increase his performance C t ð Þ by changing the value of u t ð Þ.
The agent's control system is an adaptive critic, consisting of two neural networks: Model and Critic (see Figure 4). The adaptive critic aims to maximize U t ð Þ. Assuming the agent's state S t ð Þ depends only on two quantities ΔX t ð Þ and u t þ 1 The model is a two-layer neural network, the work of which is described by the formulas: where x M -input vector, y M -output vector of the hidden layer, w M ij and v M j -weights of neurons. The critic is intended to assess the quality of situations V S ð Þ, i.e., to assess the utility function U t ð Þ for an agent in the state S t ð Þ. The critic is a two-layer neural network, whose work is described by the formulas: where x C is the input vector, y C is the vector of the outputs of the neurons of the hidden layer, and are the weights of the neurons. For each time t, the following operations are performed: 1) The critic estimates the value of V for the current state V t 2) The ε-greedy rule is applied: the action that corresponds to the maximum value V pr u t þ 1 ð Þ is chosen with probability 1 À ε, whereas an alternative action is chosen with probability ε; 0 < ε < < 1. The choice of action is The weights of the neural network are adjusted to minimize the prediction error by backpropagation: Then, the error is computed: The value of δ t ð Þ characterizes the error in the estimate V t ð Þ ¼ V S t ð Þ ð Þ, which is the total reward that can be obtained based on the state S t ð Þ. The error δ t ð Þ is calculated considering the current award r t ð Þ and the estimate of the total award V S t þ 1 ð Þ ð Þ. 5) Weights of the Critic neural network are adjusted to minimize the value of δ t ð Þ, this training is carried out by the stochastic gradient descend method: where α C -is the training speed of the Critic (α C > 0). Presented in Figure 4: Schematics of reinforcement learning.

Performance Measurements
The prediction performance of the proposed system was evaluated using three metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Scaled Error (MASE), Accuracy (A c Þ; Specificity (S p Þ; Sensitivity (S e Þ; Kappa K ð Þ and detection rate D r ð Þ. MAE is defined as the average of the difference between predicted and actual values in the test.
RMSE is defined as the standard deviation of prediction errors in a test.

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n MASE is defined as a measure of the accuracy of predictions that represents it as a percentage in comparison to a standard mean error.
The classification performance measures are defined as follows: Where A c is the accuracy, S e is the sensitivity, K is the Kappa, T P is the true positive, T N is the true negative, F P is the false positive, D r is the detection rate, F N is the false negative, P o is the probability of the observed accuracy and P e is the probability of expected accuracy obtained from the confusion matrix.

Datasets
Images of Malaria Cells used in this paper were obtained from the Kaggle database (Kaggle 2018). A balanced dataset was used for our experiments. The dataset comprises 27,558 photos of cells with equivalent instances of infected cells (13,779) and uninfected cell images (13,779) taken from the samples of blood cells using a microscope. Sample of malaria-infected and uninfected images are shown in Figure 5. This dataset is then divided into sets of training (80%) and test (20%). As a result, the training dataset consists of 22,046 images while the test set consists of 5,512 images.

Results and Discussion
Deep learning technologies such as Convolutional Neural Networks (CNN) are commonly employed in classifying images. They are designed to work with images as inputs, but they can also handle text, signals, and other continuous responses. The anatomical structure of a visual cortex, which incorporates configurations of basic and complex cells, is the inspiration for CNN. Based on the sub-regions of a visual field, these cells are discovered to activate. Receptive fields refer to these sub-regions. The neurons in a convolutional layer link to sub-regions of the layers preceding it, rather than being totally connected as in other types of neural networks, as a result of the findings of this study. Outside of these sub-regions in the image, the neurons are unresponsive. Convolutional layers, batch normalization, max-pooling layers, softmax, and fully connected layers are among the layers that make up CNN. The neurons in each layer of a CNN are organized in three dimensions, translating a threedimensional input into a three-dimensional output. Malaria image input of the infected and uninfected in the first layer (input layer) retains the images as 3-D inputs, with the dimensions being height, width, and the color channels of the image, which were set as 32,32,1 accordingly. The neurons in the first convolutional layer link to the areas of these malaria images and turn them into a 3-D output. The hidden units (neurons) in each layer learn nonlinear combinations of the original inputs, a process known as feature extraction. These learnt characteristics, also called as activations, from one layer become the inputs for the following layer. The learnt features are used as inputs to the classifier function at the end of the network. Optimizers based on Stochastic Gradient Descent with Momentum (SGDM) were used to train the network. The size of the mini-batch is specified using the MiniBatchSize pair parameter of training Options, and MaxEpochs is set to 50 for fine-tuning and transfer learning. The second method presented in this paper is the Directed Acyclic Graph Convolutional Neural Network (DAGCNN). A DAGCNN contains layers that are organized as a directed acyclic graph and is more sophisticated than a series architecture of CNN, which has layers that have inputs from several layers and outputs to numerous layers. When used to image processing, these structures combine pixel localization information from beginning layers into final layers. The third approach is Data Augmentation Convolutional Neural Network (DACNN) with Reinforcement Learning. DACNN prevents the network from overfitting and remembering the specifics of the training images. It also aids in the improvement of CNN performance and outcomes by generating new and diverse instances for training datasets. We build an imageDataAugmenter object to specify image augmentation preparation options including scaling, rotation, translation, and reflection. Randomly translate and rotate the malaria images by up to three pixels horizontally and vertically, and by up to 20 degrees. Matlab R2018a was used to run the experiments. The machine utilized has a DELL motherboard, 4 GB of RAM, and an Intel Dual Core @ 2.20 GHz processor. The computer operating system utilized in this study was Windows 8.1. In this paper, the performance of Convolution Neural Network (CNN), Directed Acyclic Graph Convolutional Neural Network (DAGCNN), and Data Augmentation Convolutional Neural Network (DACNN) was compared. Data augmentation is an approach that artificially builds new training data from existing training data. It can expand the size of a training dataset by generating better versions of data in the dataset. Training deep convolution neural network models using extra data can result in more effective models. Moreover, the augmentation methods can generate disparities of the images that can augment the capacity of the suitable models to hypothesize what they have learned to new images. The overall performance of CNN, DAG, and DACNN models are evaluated using eight performance measures: MAE, RMSE, Mean Absolute Scaled Error MASE, Sensitivity, Detection Rate, NPV, Prevalence, Accuracy, Kappa, and 95% CI. Table 2 displays the performance metrics for each criterion used in this paper. As shown in this table, DACNN has the lowest error rate compared to other algorithms. However, the performance of CNN and DAGCNN are evaluated using sensitivity, detection rate, NPV, prevalence, accuracy, kappa, and CI (as shown in Tables 3-6).
CNN failed to classify malaria blood smear images as shown in the performance metrics such as S e ; NPV; D r and Prevalence. Sensitivity (S e Þ of the CNN of the two classes of malaria blood smear images, for example, Infected and Uninfected is within the range of 65-81% as shown in Table 3. DAGCNN on the other hand also failed to classify malaria blood smear images as shown in the performance metrics such as S e ; NPV; D r and Prevalence (as shown in Table 4). The Sensitivity (S e Þ of the DACNN of the two classes of malaria blood smear images, for example, Infected and Uninfected is within the range of 92-96%, Detection Rate (D r Þ is 41-53%, NPV is 94-95% while Prevalence is 44-55%. This shows that DACNN performs better than CNN and DAGCNN as shown in Table 5. Table 6 is the overall evaluation of CNN, DAGCNN and DACNN models based on A c ; K and CI. DAGCNN failed to classify malaria blood smear images. According to Table 6, DACNN performed best with an accuracy of 94.79%, and kappa of 89.44%, followed by CNN, with an accuracy of 72.62%. This shows that integrating Data Augmentation and Convolution Neural Network data can improve the classification efficiency of malaria smear imaging.
CNNs use image features for classification. These features are learned by the network during the training process. In this study, the hidden layer output visualizes features used for diagnosis of malaria disease from blood sample images was adopted. The complex patterns and textures of the infected blood samples produced by the third convolutional layer are shown in Figure 6.    Despite these promising results, the proposed approach has several drawbacks. To begin with, all deep learning methods have a tendency to overfit the training dataset. Because the purpose of deep learning models is for them to generalize successfully from training data to any data from the problem domain, it is critical for CNN to make predictions on datasets it has never seen before.
Overfitting occurs when a model tries to learn too many details from the training data while still allowing for noise. As a result, the model's performance on unknown or test datasets is unsatisfactory. This can make the network to fail in generalizing the training dataset's features or patterns. This inhibits people from making broad generalizations. Moreover, gamma correction may not be the ideal strategy for image enhancement in poor lighting circumstances.

Conclusion
A new deep learning model, called the data augmentation convolutional neural network (DACNN) was proposed in this paper. The proposed model was trained by reinforcement learning to tackle this problem. The paper compared DACNN with other variations of CNN to investigate its performance. Simulation results show that DACNN performs better than the convolutional neural network (CNN) and the directed acyclic graph convolutional neural network (DAGCNN). The result shows that DACNN outperforms the previous techniques used in earlier studies in image processing and classification. DACNN achieves 94.79% classification accuracy while utilizing DAGCNN or CNN achieved just 68.61%, and 72.62% accuracy, respectively. Therefore, the deep learning methods combined with reinforcement learning can produce faster and more accurate (with an accuracy of 94.79%) results in malaria screening using image recognition. This paper shows the benefits of data augmentation in improving the classification performance of malaria blood smear images using deep learningbased image classification techniques.
In the future, we aim to adopt interdisciplinary methods that combine medical professionals' knowledge and experience with deep learning-based systems to further increase the effectiveness and diversity of the model. In addition to this is the deployment of the model on low-cost consumer smartphones for tele-healthcare applications.