Matlab Based Potent Algorithm for WBc cancer Detection and classification

This paper aims to automate the detection of cancer using digital image processing techniques in MATLAB software. The analysis of white blood cells (WBc) is a powerful diagnostic tool for the prediction of Leukemia. The automatic detection of leukemia is a challenging task, which remains an unresolved problem in the medical imaging field. This Automation in Biological laboratories can be done by extracting the features of the blood film images taken from the digital microscopes and processed using MATLAB software. The aim of this approach is to discover the WBc cancer cells in an earlier stage and to reduce the discrepancies in diagnosis, by improving the system learning methodology. This paper presents the potent algorithm, which will eliminate the dubiety, in diagnosing the cancers with similar symptoms. This Algorithm concentrates on major WBc cancers, such as Acute Lymphocytic Leukemia, Acute Myeloid Leukemia, chronic Lymphocytic Leukemia and chronic Myeloid Leukemia. As they are life threatening diseases, rapid and precise differentiation is necessary in clinical settings. These cancers are categorized by segmentation and feature extraction, which will be further, classified using Random forest classification (RFc). RFc will classify the cancer using a decision tree learning method, which uses predictors at each node to make better decision.

The Group of uncontrolled and abnormal cell growth leads to a stage called cancer. Blood cancer is predominant cancer, which will affect the normal blood cell growth and development in bone marrow. It leads to the absence of the blood cell's function, at the place where it needs to fight pathogens. As per the statistics (2019) of Leukemia & Lymphoma Society, more than 1.7 lakh people are diagnosed with a blood cancer 1 . In most of the high severity diseases, early diagnosis could reduce the high mortality rates. For blood cancer, the patient might need several blood tests and biopsies, before and after starting the cancer treatment. The current system used by the pathologists for identification of blood parameters is costly and the time involved in the generation of the reports is also comparatively more. The two major methods in exist are manual and automated blood tests, for diagnosing cancerous cell. In manual diagnosis, the time and manpower for diagnosis is high. In automated diagnosis, highly advanced equipment is used, so the cost could be a major drawback. Hence, there should be an automated process, where the blood cell images can be diagnosed in very little time with as minimum cost as possible. With the help of this new potent algorithm, the images can be diagnosed automatically, and the classification of the tumor will be followed efficiently with respect to the morphological features extracted from the cell image. While in the detection process, several methods used for the segmentation of red blood cells (RBC) from white blood cells (WBC), using a color space model with the help of MATLAB software 17,20 . Based on the training set, a random forest classier will finally identify and name the exact cancer type in the WBC.

Pathology of leukemia
Leukemia is the uncontrolled and abnormal group of white blood cells, produced in the bone marrow. Due to the morphological and functional malfunction in leukocytes, the protection against the foreign organisms is at risk. As per the national cancer institute report, more than sixty thousand people are diagnosed with leukemia in 2019. The two main categories of leukemia, such as acute and chronic leukemia 2, 12 . Acute leukemia will develop faster and rapidly, In this case, treatment should be initiated, upon the diagnosis as soon as possible. The most common treatment methods are chemotherapy and stem cell therapy. Chronic leukemia progression is slow and the diagnosis could not be made until the symptoms are in the picture. The four most common types of leukemia are followed 3 , 12

Acute lymphocytic leukemia
In this type, the tumor will be grown in the immature WBC such as B or T lymphocytes. It will affect bone marrow in all parts as well as spread to lymph nodes, spleen, and liver. The children are mostly affected by this type of tumor.

Acute myeloid leukemia
This tumor will affect the blood components and develop quickly. The myeloid stem cells are mostly mature into abnormal myoblasts or WBC. The adult population is highly affected by this type of cancer.

Chronic lymphocytic leukemia
It will start growing in the B lymphocytes slowly and, they crow out the healthy cells. The symptom of this type of tumor is slow invisible appearance. Older adults are highly affected by this type of cancer.

Chronic myeloid leukemia
This is a rare type of tumor that will occur when a genetic change modifies the myeloid cells into immature tumor cells. Like acute myeloid, adults are more likely than children 13,24 .
The possibility of curing leukemia depends on the subtypes and the factors associated with its growth. Physicians often discover that a person has leukemia through, regular blood testing. So, this automated image diagnosis will assist them effectively to classify the type of leukemia 4,8 .

Flow of automated diagnosis
The automated diagnosis of images begins with the loading of the input image from the Digital microscope or any other digital source 5 . After initializing the image into the algorithm, the following flow ( Fig. 1) will be followed and executed to get the desired output 6,9 .

RGB to Gray conversion
The input image, which is initialized in the algorithm, is shown in Fig. 2. To reduce the complexity, the image is converted from a 3D pixel value (R, G, B) to a 1D (gray) value. Some tasks do not fare better in 3D pixels, like edge detection 6,10 .
So, after resizing the input image into the matrix of [512,512], the grayscale operation is executed. The converted image is shown in Fig. 3.

Image Enhancement and clustering
By using the inbuilt function, the image is further enhanced for better visualization of WBC cells. The better visualization is occurring, because of the even distribution of pixels by the enhancement function. The enhanced image is shown in Fig. 4.
Clustering is an effective and efficient way to segment an image. Because for performing the morphological operation, the segmented image will be easier than a normal image 14 . So, the cluster technique will utilize the input image and group the similar data information into three clusters 19 . The clustered image is shown in Fig. 5. The major WBC and rest components in the image are partitioned, due to the clustering technique 15,16 .

Morphological Operation
The processing of cell's shape, border and other small objects around the targeted cell was taken care by morphological operation. Initially, Dilation and erosion process was performed to convert the image into binary with the pixel value of 0 and 1. Then the value of 0 is assigned to WBC and 1 is assigned to rest components. After getting this image, complement function is used to swap the pixel values, to get the complement image, which is shown in Fig. 6.
To remove the other small unwanted objects in the image, the magnification value has been assigned. magnification_value=2000; II=round(magnification_value/15); bw1 = bwareaopen(bw,II); The above function is used to remove the unwanted objects in the image. To remove the unwanted boundaries of rest cell components in the image is executed by the following function. bw2=imclearborder(bw1); After the removal of both small objects and unwanted boundaries (Fig. 7), the final output image for morphological operation (Fig. 8) is obtained by following the dilation function. bw5=imdilate(bw2,strel('disk',2)); For easy counting of the WBC, the centroid is calculated for each cell, with the help of following bounding box function.

Feature Extraction
Feature extraction is an important phase, where geometry texture, and color features are extracted from the input image, for classification. In the geometric features, the mean of area, diameter, radius, perimeter, eccentricity, solidity and elongation is calculated and the following matrix is formed 7,11 . Geome_Fea = [Area dia rad perimeter ecc elg Elongation]; For texture feature, Grey level cooccurrence matrix (GLCM) features extraction is done. The absolute value of angular momentum, energy, entropy, homogeneity and correlation values are calculated using abs() function. The following matrix is the overall texture feature matrix 17 .

Tex_Fea = [angular_momentum Energy Entropy
Homogeneity Correlation]; Finally, color features are extracted by finding the mean of all three colors and the following matrix is formed 20 .
Co_Fea = [R G B]; After, extracting all three features, the following final matrix is formed.

Feature = [Geome_Fea Tex_Fea Co_Fea]; Classifier and its Decision tree
Random Forest classifier is used to classify the WBC tumor with the help of decision trees 21 . Decision trees (Fig. 9) are the base for the RFC algorithm, which uses predictors at each node to make the best decision 22 . Basically, RFC behaves like an ensemble, with a large number of individual decision trees. Based on the fundamental RFC's concept, WBC feature value matrix forms a large number of relatively uncorrelated trees and produces ensemble predictions.

Results anD DisCussion
The efficiency and the accuracy of the tumor classification are tested with the image of acute lymphocytic leukemia. The input image is taken from the digital microscope which is initialized in the algorithm. After initializing the JPEG format image, the complexity of the image is reduced and the pixels are evenly distributed to enhance the image quality. The similar information in the processed image is, further, clustered for effective segmentation. Then the morphological operation is performed to get the binary image. With the help of binary and complement image, we can clearly figure out WBC cells in the image. The boundaries and the unwanted objects around the WBC cells were fixed, and the numbers of WBC cells are calculated using the bounding box. Then, geometric, texture and color features are extracted in the matrix format for the classification process. After getting the features matrix, the threshold of the input image is verified with the training set 25 . Then the decision tree is constructed, and the values of the predictors are initialized in the variable of yfit4. Finally,the predicator value at each node will  fig. 10.

COnClusIOn
The efficiency of this automated diagnosis will cause a huge impact in the Blood cell diagnosis domain. Due to its, adaptive threshold characteristics, the feature can be extracted for all kinds of images and the derivation of the thresholds can be done automatically using predictors. Comparing the time taken for manual blood tests, this automated and computerized diagnosis takes very less time, which is around 1 to 2 minutes. With all these captivating features, this software will be effective for Hematologists, by eliminating the difficulties in classifying cancer with similar symptoms. Training the classifier is the notable difficulty in this research work. It took huge time for an efficient classification. With the help of this strong training set and potent algorithm, we can extend this work to classify all other blood cancers by adding the features. In future work, the classification could be extended in the detection of all other types of blood cancer, irrespective of WBC. Even, it could be tested and developed with some other artificial neural network for some better efficiency, in order to reduce the time taken for training the classifier.