Imperative Dynamic Routing Between Capsules Network for Malaria Classification

Malaria is a severe epidemic disease caused by Plasmodium falciparum. The parasite causes critical illness if persisted for longer durations and delay in precise treatment can lead to further complications. The automatic diagnostic model provides aid for medical practitioners to avail a fast and efficient diagnosis. Most of the existing work either utilizes a fully connected convolution neural network with successive pooling layers which causes loss of information in pixels. Further, convolutions can capture spatial invariances but, cannot capture rotational invariances. Hence to overcome these limitations, this research, develops an Imperative Dynamic routing mechanism with fully trained capsule networks for malaria classification. This model identifies the presence of malaria parasites by classifying thin blood smears containing samples of parasitized and healthy erythrocytes. The proposed model is compared and evaluated with novel machine vision models evolved over a decade such as VGG, ResNet, DenseNet, MobileNet. The problems in previous research are cautiously addressed and overhauled using the proposed capsule network by attaining the highest Area under the curve (AUC) and Specificity of 99.03% and 99.43% respectively for 20% test samples. To understand the underlying behavior of the proposed network various tests are conducted for variant shuffle patterns. The model is analyzed and assessed in distinct environments to depict its resilience and versatility. To provide greater generalization, the proposed network has been tested on thick blood smear images which surpassed with greater performance.


Introduction
Malaria is a serious universal health burden that is caused by protozoa of the genus Plasmodium that is spread through the bite of a female Anopheles mosquito [1]. Five parasite species cause malaria in humans: P. falciparum, P. vivax, P. ovale, P. Knowlesi, and P. malaria [2]. Among these species, two of the species P. falciparum and P. vivax pose a huge threat to human lives. In 2019, the World Health Organization (WHO) survey report estimated 228 million cases of malaria [3]. Major malaria cases belong to African regions (93%), followed by the South-East Asia regions (3.4%) and Eastern Mediterranean regions (2.1%) [3]. In general, there are several procedures to examine malaria disease through microscopic thick or thin blood smears that are popular and commonly used methods [4][5][6]. Thus, these techniques are extremely time-consuming and require a clinical expert that has limited reliability [7,8]. To deal with such issues, an automatic diagnosis will ensure an accurate diagnosis process for detection and early diagnosis of malaria to prevent deaths. In this era, high computational techniques have made huge contributions to achieve higher and accurate detection through various medical diagnoses such as computerized tomography, Magnetic resonance imaging, Microscopy, and Ultrasound analysis [4,[9][10][11]. Most of the medical image analysis requires a computer-assisted diagnostic procedure by utilizing methods that help in learning the model. Most of the models developed with handcrafted features are used for image classification and decision making which can impart lower generalization [12]. Recently, major research work was done on deep learning architectures. These notable networks played an important role in various kinds of image classification, detection, and recognition [13,14]. CNN's play a significant role in the deep learning community, not only extract features but also compute predictive targets and provide actionable predictive models to help doctors competently [15]. However, traditional CNNs do not preserve the contextual proportional relationships of objects within an overarching image. Applying these shortcomings to microscopic images can lead to misclassification of the diagnostic model. There is a need for a better and accurate diagnostic model to enhance the performance of malaria detection and classification.
This research develops an automatic diagnostic model for parasite detection and classification of malaria: • To extract the blood cell image features by using fully connected convolution neural networks (without pooling layers). • To capture the spatial relations between the cell features using Capsule nets which may have high potential to identify infected regions. • To compute dynamic routing (modified) for the conversion of low-level to higher-level capsules that is useful for predicting instantiation parameters and classifying them with L2-Norm.

Previous Works
In the literature, various research studies reveal the application of machine learning as well as deep learning methods towards malaria detection from microscopic blood images. This study discusses the recent applications of deep learning that prompt the key aspects to impact malaria diagnosis. Liang et al. [16] proposed a 16-layer convolutional neural network (CNN) to classify the parasite cells in thin blood smears on microscopic slides. However, the average accuracy of this method is 97.37% by performing ten-fold cross-validation. On the other hand, the transfer learning approach achieves an accuracy of 91.99% only. Quinn et al. [17] demonstrated a deep CNN model for malaria diagnosis, tuberculosis (TB) in sputum, and intestinal parasite eggs in stool samples. Nevertheless, in every class, the model attained a high accuracy by 50-50 train-test split.
This approach contributes to many advances in microscopy for point to care diagnostics and useful for laboratory staff in rural areas. Shen et al. [18] presented deep autoencoders for malaria diagnosis from erythrocyte images, by utilizing stacked autoencoders that are trained with two distinct Golomb-Rice encoders that study the important features from erythrocyte images and achieved top results for lossless coding methods. Dong et al. [19] presented machine learning algorithms and CNN's methods for classifying the malaria parasites from blood cells. In this method, different novel architectures were implemented out of which LeNet, AlexNet, and GoogLeNet achieved an accuracy of more than 95%. Whereas, machine learning algorithms such as SVM and Naive Bayes Classifier could not perform better much than deep architectures [20][21][22]. However, most of these approaches required minimal human intervention under-diagnosis decision making. Hung et al. [23] applied a faster R-CNN for object detection on malaria blood cell images, these cell objects are labeled and classified with two different stages. In a one-stage classification, the model consists of faster R-CNN and baseline (machine learning method). The model attains a poor accuracy of 50% and in two-stage detection and classification, the model consists of faster R-CNN and AlexNet which detects objects and classifies RBC or not. The model depicts an accuracy of 98% (discarding background). The model tends to lose its generalizing ability when tested on new samples. Sivaramakrishnan et al. [24] proposed a CNN model for malaria detection and cell classification. The approach understands the probabilities of various layers and visualizes the activation maps of each layer to understand the model learning strategy and finally attains the accuracy of 98.61% with lesser complexity and lower computation speed. Sivaramakrishnan et al. [25] proposed a customized CNN model and discussed the performance of pre-trained CNNs and deep learning methods as feature extractors to classifying parasitized and healthy blood cells from microscopic images. This method uses features from shallow layers to perform more significant results than deep features and applied a level-set based algorithm to detect and segment RBCs. The model achieved an accuracy of 98.6% (cell level). In modeling, the experiments have shown random noise from the training set which leads to overfitting of the model. Pan et al. [26] showed that LeNet-5 is capable of detecting malaria and it learns features automatically by stacked autoencoders from a given input. LeNet-5 architecture has two convolutional layers, two subsampling layers, and fully connected layers. Finally, the model got an accuracy of over 90% of four different datasets (two are augmented). Gopakumar et al. [27] presented the CNN model for the analysis of blood smears being a parasite or not. For the detection, customized CNNs are used and a two-level segmentation method is used for blood cell counting and categorization of cells. The detection accuracy in terms of sensitivity and specificity is 97.06% and 98,50% respectively. Vijayalakshmi et al. [28] proposed a transfer learning approach to distinguish between infected and healthy cell images by using VGG16 and Support Vector Machine (SVM) models which obtained a classification accuracy of 93.13%. Yang et al. [29] performed five-fold crossvalidation on the customized CNN model with two processing steps. First, intensity-based iterative global minimum screening (IGMS) schema, which is implemented to recognize the parasite, and then customized CNN are used for classification of the presence of a parasite. The method attained an accuracy of (93.46% ± 0.32%). Avinash Kumar et al. [30] developed an application based on CNN for malaria detection that classifies the infected and non-infected malaria images and achieves an accuracy of 96.62%. Maity et al. [31] developed a hybrid screening algorithm for automated identification and classification of malaria parasites. A modified capsule network is utilized for the classification of segmented blood cells to categorize the species and stages of the malaria parasites. Be that as it may, previous research works on CNNs for malaria detection played a significant role in the deep learning community. By Observing the aforementioned challenges, this research develops an automatic diagnostic model for parasite classification of malaria using capsule networks by surpassing the drawbacks of existing research stated as follows: a) Discarding Low pixel resolution images. b) Decreasing loss of information with the use of fully connected convolutions (discarding pooling layers) and imparting spatial relationship with imperative routing mechanism. c) Imparting greater generalizations to the model by evaluating variant test samples with altering shuffle patterns.

Dataset Description
The dataset consists of two distinct classes of thin blood smear images of erythrocytes in which one class is infected with malaria and the other is healthy. These samples are collected from the National Institutes of Health (NIH) repository where this data was publicly available for research [25]. There are 27,558 images collectively for both the classes without any class imbalance in data. These samples are distributed with a variant pixel resolution from image to image. So, the samples are iso-topically reshaped into a resolution of 128×128 and with three channels (RBG) to fit into the designed neural network. The thin blood smear images of erythrocytes either infected with malaria or having no infection are illustrated in Figs. 1 and 2 to give a visual understanding of their morphological properties.  Tab. 1 shows the distribution of malaria data samples which is evaluated in the proposed model for training and testing and samples ranging from 10%-50%.

Capsule Network
This section discusses the proposed imperative routing approach of capsule network, whose motivation is to maximize the performance of networks for malaria classification. The custom capsule network has three blocks: 1. Convolution block, 2. Capsule network block, and 3. Loss function block showed in Fig. 3. In Fig. 3, Initially, the input is fed into a series of fully connected convolution blocks to extract a set of features (local & global) regarding the parasitized and uninfected portions. These captured features are learned by understanding their spatial orientations regarding extracted features by using the imperative routing mechanism between primary and secondary capsule layers. Then after a certain set of iterations, these feeds are classified to Infected and healthy class by applying L2-Norm.

Convolutional Block
Most of the modern convolutional neural architectures utilize a pattern of building alternative convolutional networks and pooling layers to extract features and attach some fully connected layers for classification. It is observed that max-pooling layers deprive the information regarding entities residing in an object inside the image. Further, the replacement for pooling layers by convolutional layers with increasing stride can resist information loss and improves learning 908 CMC, 2021, vol.68, no.1 operation during spatial-dimension reduction [32]. So, a series of convolutional layers attached with batch normalization layers (Batchnorm) is proposed to extract the invariant features from the images (malaria).
To understand the learning ability of individual layers, activation maps of each layer is visualized. Parameters regarding network architecture are described in Tab. 2. For example (124, 124, 3) signify the feature size, ReLU denotes the activation function, 1 represents the stride, and (5, 5) specifies the kernel size of the network. The Batch normalization (Batchnorm) is used to increase the stability of the capsule network. The input image (infected or healthy) is first sent into a series of convolutional layers. In this, an input of shape 128 × 128 × 3 is spatially reduced to 4 × 4 × 64. This latent space represents a shape of generic invariances residing in an image that completely describes it.

Capsule Block
The Capsule networks [33] are recently developed neural networks with state-of-the-art performance in the classification of the MNIST dataset by vanquishing the pitfalls of max-pooling layers and convolutional networks. The traditional capsule networks work efficiently by following a design paradigm. Initially, the input is fed into a sequence of convolutional layers, and then the extracted features from them are divided into partitions named as the primary capsules. These primary capsules hold the information regarding entities extracted from convolutional layers. Now, the next layer is named secondary capsules in which every partition in the primary capsules tries to predict the output in the next layer. This process of predicting the successive layer's information is regarded as dynamic routing. Further, the predicted information of the individual capsule is evaluated with the original information. If the predictions regarding information in capsules are not agreed, then the weight is reduced, and if the agreed weight is increased to a sustainable amount. This process is known as routing by agreement. Now, the complete routing is repeated for a certain number of iterations. The reason for performing these iterations is to understand the orientation of the entities residing in the partitioned capsule which also helps in understanding invariances of the image with spatial orientation. In our research, these capsule networks have been imparted by careful fine-tuning under certain conditions such as: 1. Select appropriate iterations. 2. Analyzing the behavior of non-linear activations during dynamic routing. 3. Modifying the routing procedure for optimal classification. Now, a detailed mathematical approach to capsule networks is obtained by understanding the insights behind its optimal performance. As it is observed that the routing mechanism is implied between primary and secondary capsules layers. The output of the latent feature map from the convolution block is of the shape 4 × 4 × 64. Now, these features are reshaped to an 8-Dimensional vector where in this vector each dimension represents the property of the entity (shape: 4 * 4 * 8). This 8-Dimensional vector is known as the primary capsules. Further, this vector is reduced with squash as nonlinearity as shown in Eqs. (1) to (4).
where λ ← 1 The nonlinearity squash ensures that short vectors i.e., vectors with fewer dimensions shrink close to zero and long vectors shrink close to unit length. The s j is the total input processed into capsules network with the jth dimension. Where s j is a weighted sum of all the predicted vectors with an assisted coupling coefficient ( ij ). These coupling coefficients are eventually updated during the iterative training phase, where x ij is the parameter that is utilized as a weighting agent which means, it adds high weight to correct predictions during routing by agreement. It is assigned as zero (in Reference [33]) and updated later by observing the training process. But it is initialized with the present weight of input to enhance the training procedure both by reducing computational expense and predictive ability. A detailed explanation of the routing procedure is shown in Algorithm 1.
for n iterations do 5.
∀, primary capsules in layer l : i ← e x ij k e x ik 6.
return v j 910 CMC, 2021, vol.68, no.1 The number of iterations during routing (n) is a hyper-parameter and iterating for a certain number of epochs is chosen during the training procedure.
Training is assessed by evaluating convergence and then the model is iterated for 2 iterations. The convergence appeared and hence, n is chosen to be 2. Momentarily increasing these numbers had no improvement in the learning process. The complete learning curves regarding the iterations for 50 epochs are visualized in Fig. 4. So, in the next step, this routing information is classified by applying L2-Norm to the routed feature units. During the routing procedure, at Step 3 in the algorithm, ReLU activation is imparted to enhance the understanding of invariances in captured features. In the traditional capsule network, squash is used as non-linearity. But it is observed that utilizing ReLU non-linearity gradually leads to convergence and provides better generalization. To understand the behavior of nonlinearities, the designed capsule model is trained for certain epochs by interchanging non-linearities such as ReLU, Leaky ReLU, and squash. Their behavior is visually shown by plotting error rates and loss of decay in Fig. 5. To understand the inner representations and learning ability of the dense layers during routing (Tab. 2), the final layer activations obtained with ReLU nonlinearity are visualized using t-SNE. The t-SNE retains not only the local structural patterns but also some global structural patterns which help in understanding higher dimensional representations. Hence for a set of samples i.e., 8000 and 16000 samples t-SNE are visually depicted in Fig. 6. Where 8000 and 16000 samples consumed execution time of 87 and 378 s respectively.

Loss Function
The objective function discussed in Eq. (5) derives the performance of the capsule model. As in the traditional capsule network, the margin loss function had weighted parameters θ, m + , m − , these hyperparameters are tuned to get the desired outcome during classification. To discriminate parasitized input to that of uninfected, the parameters are tuned accordingly i.e., θ ← 0.45; m + ← 0.85; m − ← 0.15. Further, the loss function is regularized with MSE to reduce the noise during training the network [34]. The objective function did improve performance by optimally tuning the hyperparameters with appropriate regularization by handling noise in intricate scenarios.

Results and Discussion
To evaluate the resilience of the model, a set of metrics are used in bio-medical pattern analysis. In this section complete evaluation of the model is done under various constraints. Initially, the proposed Capsule Network is compared to existing state-of-the-art classification architectures which outperformed in ILSVRC for the past decade. Those pre-trained weights are utilized to extract features. This extracted feed is forwarded to a series of fully connected layers with 64-32-2 neurons at each layer. The final layer is activated with SoftMax and other layers used ReLU non-linearity. As mentioned in the Tab. 1 the proposed model is evaluated for testing samples ranging from 10%-50%. The experimentations were carried out for multiple splits and of which the performance of the capsule network attained high for 20%. A set of standard classification metrics are evaluated for the model such as, accuracy score, AUC-ROC, sensitivity, and specificity are evaluated for individual split are illustrated in the Tab. 3. But, for simplicity 20% split is considered for comparison and is mentioned in Tab. 4.  A comparative study has been shown in Tab. 5 to compare the proposed method vs. other existing methods with respect to sensitivity, specificity, and f1-score. Tab. 6 describes the performance of the proposed capsule network with variant non-linearities. Their performance is compared to that of state-of-the-art pre-trained models over a generalization split of 80%-20% for training and testing samples. It is observed that most of the models tend to perform poorly. Their poor performance on this dataset is observed. Their weights tend to perform well on standard classification datasets but they aren't able to classify them due to poor-feature extraction due to overwhelming depth. Now the problem of overwhelming depth is observed and visualization regarding the conceptions provided are clearly depicted without any ambiguity. In this research work to evaluate the performance of the model AUC, Sensitivity, specificity, and accuracy score are chosen as standard metrics with their individual importance in biomedical pattern analysis. AUC determines the quality of Neural Architecture (classification) and its discriminative ability is standard with statistical consistency. While choosing sensitivity and specificity, they might not estimate the probability of occurrence of disease in an individual patient by qualifying the diagnostic ability but, determines the diagnostic correctness.  Finally, accuracy score is a standard metric known for its generic utility invariant domains. So, these metrics are utilized to determine the robustness of the proposed model. For appropriate generalization, the model is driven to evaluate its performance by diving testing samples with certain partitions varying from 10%-50%.
At each split (partition) samples are randomly shuffled five times not to misguide the predictions. So, at each test case, these 5 times shuffled samples are aggregated and tabulated in Tab. 3. In each shuffle of a test phase, the model with greater performance is depicted in the form of confusion matrices as shown in Fig. 7. It is observed that previous literature, for a certain period used handcrafted techniques which tends to extract feature which is highly pre-processed and was not able to perform well on large samples. After the evolution of convolution networks, the process of automated learning has improved but they were not able to overhaul the problem detecting entities with a flickering spatial orientation that increases misclassification errors. By observing these challenges, a capsule network, with appropriate training is developed to overhaul the problem individually. Additionally, the problem of generalization is proposed also observed and shown experimentally that the proposed network is verbally robust to unseen samples. The testing performance was extreme at 20% test samples under one of the shuffle constraints. At which the model acquired an AUC of 99.03% and highest specifically of 99.43%. To understand the behavior of individual networks at various test shuffle an Accuracy vs. test plot is visualized in Fig. 8. It gives insights into the model by determining the accuracy of the model at various shuffles for individual tests. As there are no class-imbalance scenario accuracy scores obtained will add significance to the model. It is observed that most of the deep vision models tend to perform poorly under three different constraints. These constraints are as follows:

Low Pixel Resolutions:
The revival of deep convolutions led to extract features automatically either by complete training of data or by transfer of features. In this process, the input is fed into the model is crucial and if the resolution of input is degraded the features are poorly extracted i.e., they might not contain the specified detail and hence input can be misinterpreted. The models [16,18,25,27,29,35] did not use proper pixel resolutions which might cause misclassification. To suppress this, a pixel resolution of 128 × 128 with 3 color channels are utilized.

Reducing Loss of Information:
Most of the information is lost due to pooling layers (maxpooling, average-pooling, etc.). These layers cause information loss either by holding the highest pixel value from a group or by averaging a group of pixels. This can mislead information regarding entity and in biomedical imaging, their impact can be adverse. So, an alternative for downscaling a series of convolutions with stride 2 is preferred. This preference not only reduces information loss but also provides uninterrupted end-to-end training. Further, capsule networks with imperative routing are utilized for understanding spatial orientations of parasitized class. Sayyed et al. [36] utilized capsule networks but processed with a low pixel resolution and added pooling layers for downscaling the dimensions. The other models [16,27,29,30] either trained with pooling layers or utilized pre-trained networks where pooling layers are broadly used. Appropriate Generalization: Most of the models utilized various generalization strategies varying from train test to 10-fold cross-validation. As of known 10-fold cross-validation is chosen as the gold standard for greater generalization outcomes. But due to its high computational complexity, it is not widely used in deep learning. Either to test the resilience of the model a train-test for variant splits is utilized. But, to understand learning patterns keenly, for an individual test split 5 randomly shuffles are used to understand the model learning patterns. As to test for further generalization, the trained models are tested on thick blood smears and this obliged to design a versatile and robust model.

Conclusion
In this research, a hand-crafted deep capsule network along with a modified routing algorithm has proposed. It attains a state-of-the-art classification of malaria being parasitized or not from blood cell images. The capsule networks deliver high efficacy in performing the classification task of classifying malaria parasites towards malaria diagnosis with the appropriate tuning of parameters. This study generates various metrics to assess the performance of proposed works and it is observed that tuning the loss parameters would affect the model's performance. This proposed network has a significant improvement when compared to other deep learning techniques and tests the blood cell images for classification of malaria by achieving an accuracy score of 99.04% on 20% split and at the worst-case scenario (50% test) attained a test accuracy of 98.44% ± 0.07%. It concludes that experimental results are beneficial as the developed model is resilient by overcoming various models and having a versatile nature. As the future scope of the work, the model can be utilized for identifying the species and stages of the parasites in thin blood smears. A vision is laid on detecting and classifying the parasites by understanding the temporal and recurring transformation of the parasite in the host.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.