A transfer learning approach to identify Plasmodium in microscopic images

Plasmodium parasites cause Malaria disease, which remains a significant threat to global health, affecting 200 million people and causing 400,000 deaths yearly. Plasmodium falciparum and Plasmodium vivax remain the two main malaria species affecting humans. Identifying the malaria disease in blood smears requires years of expertise, even for highly trained specialists. Literature studies have been coping with the automatic identification and classification of malaria. However, several points must be addressed and investigated so these automatic methods can be used clinically in a Computer-aided Diagnosis (CAD) scenario. In this work, we assess the transfer learning approach by using well-known pre-trained deep learning architectures. We considered a database with 6222 Region of Interest (ROI), of which 6002 are from the Broad Bioimage Benchmark Collection (BBBC), and 220 were acquired locally by us at Fundação Oswaldo Cruz (FIOCRUZ) in Porto Velho Velho, Rondônia—Brazil, which is part of the legal Amazon. We exhaustively cross-validated the dataset using 100 distinct partitions with 80% train and 20% test for each considering circular ROIs (rough segmentation). Our experimental results show that DenseNet201 has a potential to identify Plasmodium parasites in ROIs (infected or uninfected) of microscopic images, achieving 99.41% AUC with a fast processing time. We further validated our results, showing that DenseNet201 was significantly better (99% confidence interval) than the other networks considered in the experiment. Our results support claiming that transfer learning with texture features potentially differentiates subjects with malaria, spotting those with Plasmodium even in Leukocytes images, which is a challenge. In Future work, we intend scale our approach by adding more data and developing a friendly user interface for CAD use. We aim at aiding the worldwide population and our local natives living nearby the legal Amazon’s rivers.


Introduction
Historically, Malaria has been a major human health concern caused by parasites from the Plasmodium genus, with Plasmodium falciparum and Plasmodium vivax causing the most of the cases [1].Despite the decrease in malaria cases between 2000 and 2015, new cases started growing again in 2016, and the greatest increase took place between 2019 and 2020 amid the COVID-19 pandemic.According to the World Health Organization (WHO-https://www.who.int/), in 2020, almost half of the world's population was at risk of contracting malaria.Thus, what was on a historical downward trend, went back up.
Several tropical diseases can manifest malaria-like symptoms, which can be misdiagnosed as malaria [2].Relying solely on the symptoms presented by the patient may result in overdiagnosis and/or overtreatment [3].Therefore, to diagnose malaria, various factors come into play: local circumstances; proficiency levels of laboratory staff; patient caseloads; and malaria prevalence in the target area [4].Regardless of the diagnostic tool used, it is essential for it to yield rapid and accurate results to enable appropriate treatment of the patient [5].
The gold standard for malaria diagnosis is the microscopic technique.The analysis of blood smears in the microscope is commonly practiced and extensively documented.Moreover, it is a notably cost-effective method.However, the use of a microscope demands highly skilled specialists (good microscopists can identify 100-200 parasites per microliter) and access to wellequipped laboratories (availability of functional equipment, reagents, and an efficient quality control system) [3,6,7].In addition, in remote regions with limited accessibility, performing microscopy can be extremely challenging [8].
Due to the wide range of variations in bright-field microscopy images, computational methods to aid in identification of Plasmodium parasites in blood smears remains largely unused.Recent advances in computer vision, especially in Deep Learning (DL) algorithms (Convolutional Neural Networks-CNN), have shown promising results in tasks involving the detection of abnormalities in medical images [9].
Previous attempts to automate, identify, and quantify Plasmodium parasites have not gained major traction, partly due to the difficulty of replication, comparison, and extension.Authors also rarely make their image sets available, which precludes replication of results and assessment of potential improvements.The lack of standard images or metrics to report results has impeded the field.In [10][11][12][13][14], the authors presented several DL methods with transferlearning approaches for malaria detection.Generally, the authors use open-source databases and binarize the dataset into two classes (infected and non-infected).In general, studies that analyze only P. vivax discharge leukocyte cases.Unlike P. falciparum, P. vivax preferable infect young erythrocytes (red blood cell) [15,16].Therefore, P. vivax does not infect leukocyte (White Blood Cells), which may lead to false-positives.
They divided the train and test into a single partition and achieved promising results.The results vary, and some present accuracy closer to 100%, which encourages further studies in this context.They consider only well-segmented Red Blood Cells (RBCs) and analyze shape and texture features.However, without Leukocytes, the classification task becomes easier.In [17], the authors employed Darknet to classify avian malaria (Plasmodium gallinaceum) in four parasite stages.Although the dataset used was heavily unbalanced, the results were promising, achieving an average 99% AUC for each class with YOLOv3 and Darknet/Densenet201, which encourages the use o transfer learning for human Plasmodium parasite classification.Ikerionwu et al. [18] presented a systematic review of the machine and deep learning approaches to detect Plasmodium parasite in microscopic images.Authors affirm that human error is a major cause of inaccurate diagnostics in the traditional microscopy diagnosis method, which was later confirmed by Lee et al. [19].
Thus, a malaria diagnostic tool would be highly useful, and there is a need for more studies in machine and deep learning approaches to detect Plasmodium parasites in humans.Generally, DL works consider 90% for training and 10% for testing in a single partition.In our work, we considered all cases, including Leukocytes (harder to classify).We balance the dataset with 80% for training and 20% for testing on each of the exhaustive 100 distinct partitions.Thus, we avoid the possibility of randomly choosing a test set that leads to better results (biased).
Our main goal is to set a solid base to build up an application for Computer-aided Diagnosis (CAD) to differentiate between infected and non-infected Region of Interest (ROIs) in microscopic images.Thus, we aim at detecting P. vivax parasites in ROIs independently if the patient is symptomatic or not or what is the stage of the disease.In future work, we intend to investigate the aforementioned issues and achieve even better results by scaling our approach to gradually consider more samples.
We contribute in two main parts in this paper: (i) We provide a balanced dataset with local cases from Fundac ¸ão Oswaldo Cruz (FIOCRUZ).Besides, we provide our codes for replication; (ii) We propose a workflow using transfer learning with circular cropping of the ROIs.Our cropping technique works as a rough segmentation, easing the CNNs process and improving classification.This ensures that if we apply a weak automatic segmentation (that does not fully segment each ROI,) the classification process can work.

Results
We assessed three essential parts of our experimental design: First, we analyze the overall results considering our Plasmodium image database with representative infected and noninfected ROIs; Next, we assess the results using statistical testing.Finally, we present a runtime analysis.

Overall results
We analyze the data distribution of all network's results, summarized in Fig 1 .Overall, every approach presented promising results, averaging over 96% for each measure.DenseNet201 presented the highest average results for every measure with a more concise data distribution and closer to a normal distribution (visually).MobileNetV2 and VGG19 presented the lowest average results for all measures.InceptionV3, Xception, and ResNet101V2 presented similar results, however lower than DenseNet201.To further validate the results, in the next Section, we performed a statistical test validation to verify if the DenseNet201's results are significantly better than the other networks.

Statistical significance
We compared the 100 partition cross-validation results for every measure on each classification model.The Kolmogorov-Smirnov normality test [20,21] showed that the data do not follow a normal distribution.Therefore, we employed the Wilcoxon test [22] to analyze whether the scores had statistically significant differences.We compared the method that presented the best classification results (DenseNet201) against the remaining methods.
Table 1 reports results for the Wilcoxon test, with a confidence interval of 99% and 95%, ticked in blue (✔) and magenta (✔), respectively.DenseNet201 presented significantly better results than the remaining methods most of the time with 99% confidence interval (✔).For 95% confidence interval (✔), DenseNet201 was significantly better than ResNet101V2 for the AUC and than Xception for the Specificity.However, note that the p-value was close to the 99% confidence interval (p-value < 0.0100).Thus, we can state, in general, that DenseNet201 presented significantly better results than the remaining networks considered in the study.
We assess each network's training and testing run-time to further validate our results in the next section.

Run-time analysis
Comparing the Run-time in Table 2, MobileNetV2 presents the fastest processing time in test and training sets.However, as we aforementioned, MobileNetV2 did not present the best results.Although DenseNet201 did not present the fastest processing time for training/testing, it achieved the best results with a processing time closer to the other methods.
DenseNet presents higher processing time due to the larger number of layers it contains (up to 152 layers), and for the very same reason.ResNet learns the residual representation functions instead of learning the signal representation directly [23], achieving better results in this case.Moreover, we highlight that the testing was performed considering 20% of the dataset, which means that, for DenseNet201, 1,244.4ROIs were processed in 3.88 seconds.Thus, we can say that, on average, a single ROI was processed in 3.12 milliseconds by DenseNet201.
In this experiment, DenseNet201 presented better results at a higher processing time cost.DenseNet201 was projected to present better results running in dedicated servers, while, for example, MobileNet was projected to run on devices with lower processing power.
In an application context, such as a website service, processing an exam or a single ROI would be fast and acceptable.This processing time can also be reduced further with better computing resources.Therefore, we firmly believe our approach can be implemented and used in a real-case scenario without concern about the processing time.

Discussion
This study applied transfer-learning (deep neural networks) to microscopic images of blood smears.We aimed at distinguishing infected and non-infected Plasmodium vivax parasites on each ROI.We assessed well-known pre-built network architecture using open-source images from BBBC.We added brand-new image cases, which we assembled locally in Porto Velho-RO/Brazil (legal Amazon) in partnership with FIOCRUZ.
Our results with balanced and cross-validated data divided into 100 partitions showed that Densenet201 has a potential to differentiate blood smears ROIs that are infected and noninfected with P. vivax parasites.Densenet201 presented an average AUC of 99.41% (with a 99% confidence interval), significantly better than the remaining methods.Generally, an AUC greater than 60% is considered acceptable in bio-medical sciences.However, an acceptable method in our experiment-wise should have an AUC higher than 90% [24].Thus, our results were promising, and we achieved them by only preparing the ROIs in a circular manner and applying the transfer-learning approach.We strongly believe there is more to be explored in future works, including other malaria stains, more samples, etc.
One limitation of our study relates to the dataset preparation process.Every image used in our dataset was collected using blood smears stained with Giemsa reagent.We can not guarantee it would work with a different stained reagent.We intend to experiment with it in other studies.Another limitation regards the regions of interest (ROIs).In our work, the ROIS presented a preliminary rough segmentation predominantly featuring a circular shape.In this way, the network uses only texture features to analyze each case.However, it is worth mentioning that, in the literature, automatic approaches, including YOLO (https://pjreddie.com/darknet/yolo/), have shown promising results in identifying ROIs within microscopic images 145 domain.
Since several microscopes and cameras were used in inter and intra-datasets, we firmly believe that our proposed approach can be used in a real-case scenario, allowing a second opinion, thus, aiding the specialists.In future works, we intend to develop an application to perform second opinions in a friendly user interface.Moreover, we intend to assess the efficacy of our approach when the ROIs present parasites with unusual morphology.Moreover, we intend to robust our approach to work with different Plasmodium species as well as identifying different blood stages.We also would like to highlight that more images can be added, and the network can be retrained to achieve even more precise results.

Materials and methods
This Section presents the proposed approach and experiments' materials and methods.Fig 2 illustrates the workflow of this work, which we have divided into three main parts.
We briefly discuss each part below, exploring each in detail in the corresponding Section.First, we considered a representative database in Section Database, comprising 1.176 exams and 6.112 samples (ROIs).Then, we created 100 distinct partitions for cross-validation (Section Cross-validation).After that, in Section Pre-trained deep learning architectures, we employed several DL pre-trained architectures.Finally, in Section Measures and computing resources we describe well-known measures we consider in our experimental design and computational resources.
We made available all codes and ROIs used in our study at [25].

Database
We considered a representative database/exams, which we assembled by combining the Broad Bioimage Benchmark Collection (BBBC-https://bbbc.broadinstitute.org/BBBC041)[26] with images from the Fundac ¸ão Oswaldo Cruz (FIOCRUZ) in Porto Velho, Rondo ˆnia-Brazil.In both datasets, only P. vivax malaria strains exist.It is one of the most common cases in America.For this reason, more data samples of this strain are available, and it is a good starting point for automatic malaria analysis.An expert malaria researcher annotated all cell objects (boxed and labeled), including red blood cells and leukocytes.Moreover, bounding boxes were marked by an expert technician.The BBBC database contains 1364 images (approximately 80,000 cells) from Brazil and Southeast Asia.The data had a heavy imbalance towards uninfected cells versus infected cells, making up over 95% of uninfected cells.Among the infected cells ROIs, there are different stages of P. vivax parasites, including gametocyte, ring, trophozoite, and schizont.Therefore, we balanced the BBBC database by randomly selecting 3001 non-infected cells (ROIs).We took the number of infected cells for each image and randomly selected the same number of non-infected cases from that same image.We divided the labeled cells (ROIs) into two categories(1) healthy cells (3.001 cases) and (2) infected cells (3.001 cases).Table 3 summarizes the final assembled database.For the FIOCRUZ database, we did not consider exams that presented infected and non-infected ROI-cases simultaneously.We segregated the exams by considering only positive cases from one exam and negative cases from another.Blood smears were stained with Giemsa reagent for both databases, and images were taken using default microscopic parameters (10x eyepiece) for Plasmodium analysis.However, several microscopes and cameras were used in inter and intra-datasets for the BBBC dataset.For the FIOCRUZ dataset, a single microscope was used, and only the focus parameters varied, which is a common practice when acquiring this kind of image.Therefore, since the acquisition protocol is the same worldwide for the acquisition of blood smears images, we did not include any bias by adding our local images.Every image is in Red, Green, and Blue-RGB format and always contain 3 channels.Overall, the images present a magnification of 100x in the objective lens and a variation of 7x to 10x in the ocular lens, yielding an average pixel resolution of 1, 240 × 1, 645, while the ROI sizes present average pixel resolution of 110 × 110.Since the overall dataset perfectly balances positive and negative cases, we did not consider any data augmentation approach to avoid bias in the final results.We made available all datasets codes and results achieved in our work (https://github.com/JonathanRamos/PlasmodiumAI).
ROI cropping in circular shape.We cropped the ROIs as illustrated in Fig 3 .First, in (a), each exam presents a bounding box for each marked blood cell so that the ROI can be cropped as in (b).Next, we discharged irrelevant information around the cell for better analysis by applying circular cropping, as (c).We produce the final ROI in (d) by cropping the original ROI using a circular mask that considers 90% of the original ROI size (in yellow in Fig 3(c)).Although it may not provide optimal accuracy compared to automatic segmentation methods, it works as a rough segmentation, which is a good starting point.Thus, easing the classification process and discharging shape features, and working only with texture and color features.
Note that the number of pixels considered in each ROI may vary due to the distinct size of the original bounding boxes.Overall, the original ROIs present a resolution of 122 × 124 ± 26 × 26.In comparison, our final ROIs present a resolution of 110 × 110 ± 20 × 20, both in RGB format (3 channels) as described in Table 3.The ROIs are re-scaled for each transfer learning architecture to fit into the minimum ROI size required by each network by padding black pixels.We want to highlight that we did not process the ROIs with any kind of color/illumination normalization or interpolation.Therefore, we allow generalizability of the acquisition process.

Cross-validation
We performed a cross-validation considering 80% of the samples for training and 20% for the test.We divided (partitioned) the data in this way 100 times, generating 100 unique partitions.In each partition, we were careful to not include ROIs from the same exam in test and train simultaneously.

Pre-trained deep learning architectures
We have used several Deep Learning (Convolutional Neural Networks-CNNs) approaches well suited to classify medical images [27].This paper presents the results for six promising

Fig 1 .
Fig 1. Overall results: Data distribution for each classifier for each measure considering the test set.The best average results are highlighted in yellow.The green area represents the Interquartile Range (IQR): 25th to the 75th percentile.https://doi.org/10.1371/journal.pcbi.1012327.g001

Table 1 . Wilcoxon test p-values results: ✔ and ✔ indicate rejection of H 0 with confidence interval of
99% and 95%, respectively.We considered DenseNet201 (data a) against the remaining methods (data b).