Colon histology slide classification with deep-learning framework using individual and fused features

: Cancer occurrence rates are gradually rising in the population, which reasons a heavy diagnostic burden globally. The rate of colorectal (bowel) cancer (CC) is gradually rising, and is currently listed as the third most common cancer globally. Therefore, early screening and treatments with a recommended clinical protocol are necessary to trat cancer. The proposed research aim of this paper to develop a Deep-Learning Framework (DLF) to classify the colon histology slides into normal/cancer classes using deep-learning-based features. The stages of the framework include the following: (i) Image collection, resizing, and pre-processing; (ii) Deep-Features (DF) extraction with a chosen scheme; (iii) Binary classification with a 5-fold cross-validation; and (iv) Verification of the clinical significance. This work classifies the considered image database using the follwing: (i) Individual DF, (ii) Fused DF


Introduction
A number of research works have been conducted throughout the world in order to develop more advanced medical diagnostic facilities to support the accurate detection of diseases so that an effective treatment can be executed.Once the diagnostic scheme is developed, it can be tested and recommended for practical use.Despite the fact that there are a number of modern facilities available, their accessibility is limited due to various economic factors.The available diagnostic facilities in high-income and upper-middle-income countries are comparatively more advanced than those available in lower-middle and low-income countries [1,2].In order to support early and precise disease detection procedures, several research projects are currently being conducted in an effort to develop an appropriate scheme to support this process.
It has been shown in a recent publication that, regardless of the economic condition of a country, the incidence rate of acute and infectious diseases among humans is on the rise worldwide due to a number of factors, including the following: (i) heredity, (ii) immunodeficiency, (iii) age, and (iv) environmental factors [3,4].It has been observed that the rapid increase in disease occurrence rates in humans causes various medical burdens, from diagnosis to treatment; this burden has an adverse effect on the economy and the health system of the country.
According to the annual report of the World Health Organisation (WHO) for 2020, the incidence rate of cancer is gradually increasing throughout the world and early detection and treatment of cancer are of the utmost importance when it comes to preventing mortality as a result of the disease.According to the report, breast, lung, and colorectal cancer (CC) have higher incident rates when compared to other types of cancers [5].There are 1.9 million new cases of CC every year, making it the third most common cancer worldwide.It is possible to access a detailed report about CC by visiting [6].
CC occurs when cells within the colon/rectum grow out of control.Polyps begin as the first stage, and if left untreated, polyps may develop into cancer.In order to reduce the risk rate, early screening and treatment are essential.There are several commonly performed clinically level screening procedures that can be used to assess and test for CC and its severity, such as personal examinations by doctors, colonoscopy/endoscopy-supported assessments and a biopsy collection for microscopic image examinations.Amongst these, the microscopy images play a vital role in detecting the harshness and the stage of the CC, which is essential to plan the appropriate treatment, such as radiation therapy, chemotherapy and surgery [7,8].Most of the current clinical procedures use the whole slide images (WSI) collected using a prescribed protocol to implement an inspection by the disease expert.To reduce the diagnostic burden, computerized procedures are implemented in clinics to support a faster and timely detection of CC from histology slides.Examination of the WSI using the computer algorithm requires a few chosen image pre-processing techniques; patch based analysis is one of the common procedures in which the WSI is cropped and resized into several small sections [9].Then these images are examined using the chosen computerized approaches.
To detect CC using the chosen biomedical imaging technique, numerous frameworks have been proposed and implemented in the literature for the detection of the disease.This work is aimed at developing and implementing a deep-learning framework (DLF), which will be used to detect CC on histological slides by using deep-learning.As part of the proposed DLF, there are a number of phases, including the following: (i) the collection, resizing, and preprocessing of histology slides; (ii) deep-features mining using the preferred method; (iii) binary categorization with a fivefold cross-validation; and (iv) verifying the clinical significance of the developed system using a benchmark histology slide database.For the CC histology slide data set, the proposed work seeks to test the performance of pre-trained deep learning schemes with the following features, using binary classifiers chosen from a variety of sources: (i) individual features, (ii) fused dual deep features, and (iii) an ensemble of features.As part of the evaluation and verification of the detection performance CC, SoftMax, Naive Bayes (NB), Decision Trees (DT), K-Nearest Neighbor (KNN) and Support Vector Machines (SVM) are separately evaluated and verified.The experimental results of this study confirms that the fused deep-feature (VGG16 + ResNet101) provides a CC detection accuracy of 99% with the KNN classifier.
As a result of this research, the main contribution of this paper is the evaluation of the performance of commonly available pre-trained models applying individual, merged, and ensemble features on a chosen histopathology image database of the CC, all with quite promising results.Furthermore the clinical significance is verified using the benchmark WSI of the Gland Segmentation (GlaS) dataset [10,11].
The remaining sections of this work are arranged as follows: Section 2 presents the literature review, Section 3 discusses the methodology and Sections 4 and 5 demonstrate the experimental outcome and results of this study, respectively.

Reference
Methodology Outcome (Accuracy %) Bukhari et al. [12] Self-supervised learning supported diagnosis of the colon histology slide is examined and the ResNet50 scheme provided an approved classification result.

93.91
Mangal et al. [13] Examination of colon and lung histology slide is demonstrated using a convolutional neural network (CNN).

96.61
Masud et al. [14] Machine and deep-learning associated framework is proposed to analyze the colon and lung histology slides.

96.33
Ali and Ali [15] Cancer detection in colon and lung histology slides are discussed using multi-input dual-stream capsule network 99.58 Sarwinda et al. [16] ResNet18 and ResNet50 supported scheme to detect the cancer in colon histology image is discussed.

>80
Hamida et al. [17] Deep-learning supported detection of CC in histology image is presented.

>98.66
Ohata et al. [18] Transfer-learning supported classification of histological images to detect the CC is presented.

92.083
Fan et al. [19] Transfer-learning supported CC detection from colon histology slide is presented.

99.29
Due to a steady increase in cancer occurrence rates, scientists and researchers have undertaken several research projects to support the automatic and precise detection of cancer based on available medical data.The purpose of this section is to summarize the selected recent CC detection works that have used the medical image-based scheme as a means to detect the occurrence of CCs and their severity, as shown in Table 1.
Table 1 presents the earlier works executed to examine CC using a chosen computerized technique.Most of these works were implemented using the available histology slides of a chosen image database.These existing works were not tested on the WSI, and hence, the proposed research aims to develop and implement an appropriate CC detection procedure which works well on the chosen histology datasets.This part of the research presents the DLF developed to detect the CC using histology slides.The complete information is presented in Figure 1.The screening stage is performed in hospitals when patients visit with CC symptoms.The standard tests involve; a physical check by a doctor, an endoscopy/colonoscopy examination, and biopsy collection and analysis.The treatment procedure for CC mainly depends on the cancer stage, and can be detected by biopsy with the chosen microscopic examination.Therefore, an assessment of the histology slide is essential in CC diagnosis, and this work proposes a DLF for a computerized assessment.This research considers the histology slides of CC for the examination and computerized examination section.Figure 1 confirms that the proposed technique follows a straightforward methodology, from feature extraction to classification.This work implements a 50% dropout in the extracted DLF to avoid the overfitting issue.The various stages of this scheme are as follows; image enhancement with Contrast-Limited Adaptive Histogram Equalization (CLAHE) [20,21], deep-feature extraction, feature vector generation based on the need (individual/fused/ensemble), and classification.In this work, a five-fold cross-validation is employed, and the best result is considered to validate the performance of the developed system.This scheme helps the doctor obtain the initial examination of CC, and the treatment must be planned and implemented by the doctor based on the severity of the CC.

Image database
This research considered the histological test images found in the LC25000 dataset [22] for the initial training and verification of the proposed scheme and the GlaS dataset [10,11] for confirming the clinical significance.The initial database consists of 10,000 images (5000 normal and 5000 cancerous images) of the CC and to verify the performance of the pre = trained models, only 40% (4000) images are considered.The sample test images considered in this study are presented in Figure 2, and the number of images considered is depicted in Table 2.The sample GlaS image is presented in Figure 3, in which Figure 3

Deep-learning scheme
Due to its merit and clinical significance, pre-trained and customary deep-learning-based methods are widely employed to examine the medical database.Compared to the customized DLF, the pre-trained models are easily accessible and implementable.Hence, these methods are widely employed in the literature to examine the biomedical images of a chosen modality.This work considered pre-trained schemes, such as VGG16, VGG19, ResNet18, ResNet34, ResNet50, ResNet101, and DenseNet201, for the examination [23][24][25].Initially, the conventional image augmentation procedure (horizontal/vertical flip, rotation with an angle of 30º , zoom range of 0.3, and width/height shift with a range of 0.3) is employed to increase the number of images to improve the performance of the training process.Furthermore, the following initial values are assigned for the considered deep-learning system to support an improved diagnosis on the chosen database: initialweights = ImageNet, batch value = 8, epochs = 150, optimizer = Adam, pooling = max, supervise metric = accuracy, and loss, classifier = SoftMax with 5-fold cross-validation [26,27].

Implementation
The, the considered scheme is implemented using individual features with a dimension of .During the individual features based assessment, the deep-features obtained (Eq (1)) from DLF is considered to verify the performance of the binary classifiers.During the fused features approach, the top two DLF features (VGG16 and resNet101) are considered, and these features are sorted based on its rank before applying a 50% dropout to obtain a reduced features vector of dimension .The fused feature vector is achieved by serially combining these two features in order to obtain a single feature vector with a dimension of , as shown in Eq (3).
Then, the classification task is separately implemented with these features, and the achieved results for the chosen classifiers are individually presented and discussed.

Performance evaluation and validation
The CC detection performance of the proposed DLF is verified using a binary classification scheme with different feature vector.During this process, the necessary values such as true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) are computed.From these values, other essential measures such as accuracy (ACC), precision (PRE), sensitivity (SEN), specificity (SPE), F1 Score (F1S), are Matthew's Correlation Coefficient (MCC) are computed, and the mathematical notation for these values are depicted in Eqs ( 4)- (9).In this work, the binary classifiers, such as SoftMax, NB, DT, KNN and SVM, are considered, and the necessary information regarding these classifiers are available in [28][29][30].

Results and discussion
This section demonstrates the experimental outcome and its discussions.The implemented research was performed using a workstation containing an Intel i5 processor, 16GB RAM and 4GB VRAM operational with Python ® .This work was initially implemented on the LS25000 dataset and then the GlaS dataset and the results are discussed.
The proposed investigation was initially performed using the deep individual features and SoftMax classifier.The achieved result for various DLFs for the five-fold cross-validation is presented in Table 3.This table confirms that the VGG16 helps in obtaining an improved accuracy compared to other methods.ResNet101 is ranked second based on overall performance and DenseNet201 is ranked third.Table 4 presents the outcome of the 5-fold cross-validation and the selected best outcome with the VGG16.Figure 4 presents the results achieved from various layers of VGG16, in which Figure 4 After analyzing the performance of the proposed scheme with a unique feature, the classification task is repeated using fused features and the ensemble of deep features, as discussed in Eqs ( 2) and (3), and the achieved results are presented.
Figure 5 presents the confusion matrix (CM) achieved with this study, in which Figure 5(a) presents the CM for fused features and Figure 5(b) presents the CM for ensemble features.The TP, TN, FP, and FN present in these CM, confirm that these schemes help to achieve a better accuracy.Figure 6 depicts the results achieved when the ensemble of features is used.Figure 6      Table 5 presents the experimental outcome achieved in this study with individual features (VGG16 and ResNet101), fused features, and ensemble features.The binary classification with a 5-fold cross-validation helps to achieve an improved classification accuracy from 93% (VGG16 with SoftMax) to 99% (fused features with KNN).This confirms that the proposed DLF works well on the LS25000 database.To verify the clinical significance of the proposed technique, the WSI of the GlaS database is also considered, and the achieved result is depicted in Table 6.This table confirms that the classification achieved good results (benign/malignant class) with both the fused and ensemble features.With the fused features, the DT classifier provided an accuracy of 97.25%, and with the ensemble features, the NB classifier achieved an accuracy of 97%.The Glyph-Plot shown in Figure 7 depicts the overall performance achieved with the proposed DLF.This confirms that the proposed scheme works well on the histology images; in the future, it can be considered to examine the clinically collected histology images of CC cancer.This research presented a DLF to detect CC using histological images with individual, fused, and ensemble features.The chief merit of this technique is that it considers the pre-trained deep learning methods, which are pretty simple in their implementation as compared to the customary model.Furthermore, the concept of the fused and ensemble features employed in this technique are also quite simple compared to other similar techniques existing in the literature.Moreover, the result achieved with the LS25000 and GlaS datasets with this DLF confirms its clinical significance.
Figure 8 presents a comparative analysis between the proposed and existing technique.The results of this comparison confirm that this scheme works well.Furthermore, compared to the results displayed in Table 1, the accuracy achieved in the presented work is either improved or closer, confirming that the implemented technique can work well on the clinically collected histology slides.The detection accuracy achieved with the proposed DLF is satisfactory and works well on the chosen database.This research considered the pre-processed GlaS database, and in future, the proposed scheme can be improved to examine the WSI with better accuracy.

Conclusions
Early detection of cancer is essential to save the patient with appropriate treatment.This research aims to develop and implement a pre-trained DLF to detect CC in histology slides with an improved accuracy.The proposed scheme considered three classes of deep features, individual, fused, and ensemble, to achieve better detection accuracy using chosen binary classifiers along with a 5-fold cross-validation.The experimental work implemented on LS25000 and GlaS database confirms the merit of the proposed scheme in detecting CC using histology slides.The result achieved with fused deep features confirms that the CC detection accuracy is better than the individual and ensemble features.The proposed scheme achieved a maximum accuracy of 99% when the KNN classifier was employed with fused features.Furthermore, the accuracy achieved with GlaS (benign/malignant classification) was >97% for both the fused and ensemble features.With this result, it can be confirmed that the presented scheme works well on the histology slides.In the future, examining the clinically collected histology slides can be considered.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
in Eq(2).During the ensemble feature selection process, the average feature among VGG16, VGG19, ResNet101 and DenseNet201 are considered to form a new feature vector with the dimension (a) presents the sample image and Figure 4(b)-(f) denotes the various layer outcomes.
(a),(b) shows the accuracy and loss for the training and validation process and Figure 6(c) presents the receiver operating characteristic (ROC) curve having a value of no skill area under curve (AUC) of 0.50 and a logistic AUC of 0.98, which confirms the significance of the implemented scheme.Similar results are computed for all other procedures considered in this study, and the results are demonstrated.

Figure 5 .
Figure 5. Confusion matrix achieved for various feature set.

Figure 6 .
Figure 6.Search convergence and the AUC curve.

Figure 7 .
Figure 7. Glyph-Plot to present the overall performance.

Figure 8 .
Figure 8. Evaluation of classification accuracy between proposed and existing methods.

Table 1 .
Summary of medical data based cancer detection schemes.

Table 3 .
Experimental outcome of pre-trained models with SoftMax classifier.

Table 5 .
Experimental results with different feature set.

Table 6 .
Experimental results with GlaS database.