EnsemDeepCADx: Empowering Colorectal Cancer Diagnosis with Mixed-Dataset Features and Ensemble Fusion CNNs on Evidence-Based CKHK-22 Dataset

Colorectal cancer is associated with a high mortality rate and significant patient risk. Images obtained during a colonoscopy are used to make a diagnosis, highlighting the importance of timely diagnosis and treatment. Using techniques of deep learning could enhance the diagnostic accuracy of existing systems. Using the most advanced deep learning techniques, a brand-new EnsemDeepCADx system for accurate colorectal cancer diagnosis has been developed. The optimal accuracy is achieved by combining Convolutional Neural Networks (CNNs) with transfer learning via bidirectional long short-term memory (BILSTM) and support vector machines (SVM). Four pre-trained CNN models comprise the ADaDR-22, ADaR-22, and DaRD-22 ensemble CNNs: AlexNet, DarkNet-19, DenseNet-201, and ResNet-50. In each of its stages, the CADx system is thoroughly evaluated. From the CKHK-22 mixed dataset, colour, greyscale, and local binary pattern (LBP) image datasets and features are utilised. In the second stage, the returned features are compared to a new feature fusion dataset using three distinct CNN ensembles. Next, they incorporate ensemble CNNs with SVM-based transfer learning by comparing raw features to feature fusion datasets. In the final stage of transfer learning, BILSTM and SVM are combined with a CNN ensemble. The testing accuracy for the ensemble fusion CNN DarD-22 using BILSTM and SVM on the original, grey, LBP, and feature fusion datasets was optimal (95.96%, 88.79%, 73.54%, and 97.89%). Comparing the outputs of all four feature datasets with those of the three ensemble CNNs at each stage enables the EnsemDeepCADx system to attain its highest level of accuracy.


Introduction
Global health has a wide range of complicated aspects. The development of globalisation has had a significant negative impact on people's health across the world. People who reside in underdeveloped countries, where such services may be rare, may be significantly affected by environmental factors such as air pollution, contaminated water, and poor sanitation [1]. Despite having improved access to medical care, people living in wealthy countries still confront challenges, including increased incidence of non-communicable diseases such as obesity, diabetes, and heart disease brought on by sedentary lives and bad eating practices. Cultural differences have a big influence on health outcomes, too. The great variety of cultural practices that exist may make it more difficult to promote universal health policies and strategies. All individuals must have access to healthcare, education, and resources to promote healthy lifestyle habits. Environmental problems that contribute to health inequities must also be addressed.
Cancer, one of the leading causes of mortality, affects individuals all over the globe. It is a complex problem that requires a comprehensive approach to diagnosis, care, and prevention. According to the World Health Organization (WHO), there will be approximately Images collected during colonoscopy, a common colorectal cancer screening method, can be used to train artificial intelligence (AI), machine learning (ML), and deep learning (DL) systems for computer-aided diagnosis (CADx) [10]. By analysing and interpreting the colonoscopy images, these diagnostic tools may facilitate a more precise diagnosis of colorectal cancer. Artificial intelligence (AI), machine learning (ML), and deep learning (DL) enable CADx systems to process enormous amounts of data quickly and provide physicians with instantaneous results. Consequently, patients with colorectal cancer may receive treatment more quickly and experience improved outcomes. In collaboration with the advancement of AI and ML technologies, the potential for CADx systems to considerably enhance colorectal cancer screening and detection is growing.
Convolutional Neural Networks (CNNs) are an effective method for detecting and diagnosing colorectal cancer when applied to colonoscopy images. Convolutional neural networks (CNNs) are well-suited for identifying polyps and other aberrant growths in the colon because they evaluate visual input and discover patterns using deep learning techniques. In recent years, the use of ensemble learning approaches to improve CNNs' ability to detect colorectal cancer has been investigated [11,12]. Fusion CNNs is one such technique; it integrates multiple CNN models to create a more dependable and accurate system. Researchers have demonstrated that fusion CNNs are more effective at diagnosing colorectal cancer than standalone CNN models. Fusion CNNs combine the strengths of multiple CNN models to compensate for their shortcomings and provide more precise and reliable diagnoses [13]. The advancement of these technologies bodes well for their application in the early detection and diagnosis of digestive tract malignancies.
Diverse methods, such as Fusion CNNs, bidirectional long short-term memory (BIL-STM) networks, and support vector machines (SVMs), have been used to examine the accuracy of identifying colorectal cancer from colonoscopy images. The term "transfer learning" refers to the process of adapting a neural network's training on one task to another task that is similar [14]. A neural network can be trained on a massive dataset of generic medical images using transfer learning in order to detect colorectal cancer. BILSTMs are a type of recurrent neural network that can process data sequences both forward and backward. BILSTMs have been demonstrated to be effective for analysing colonoscopy images due to their ability to detect both local and global patterns. Support vector machines (SVMs) are classification applications of supervised learning algorithms. They function by identifying the optimal hyper-plane for data clustering. There is evidence that combining SVMs with BILSTMs and transfer learning improves the diagnostic accuracy of colorectal cancer.
The proposed CADx for colorectal cancer detection, the EnsemDeepCADx system, employs a multistage technique in order to locate and identify polyps that may be indicative of the disease.

•
The initial stage is to generate a dataset for feature fusion by combining the CKHK-22 mixed image datasets with others, such as the Grey and LBP datasets. • Developing and training three ensemble fusion CNNs using the feature fusion CNN and the other featured datasets is the next step.

•
In the subsequent stage, all CNNs are trained with SVM and transfer learning.

•
In the final stage, temporal and spatial information is extracted using transfer learning ensemble fusion CNNs with BILSTM and SVM multi-class classification. • At each stage, the EnsemDeepCADx model is inspected for inconsistencies and other performance metrics, and the resulting data is compared to determine the optimal approach.

Organisation of Study
The research article is divided into multiple sections. The second section provides a comprehensive literature review of the extant studies and research relevant to the proposed CADx system for diagnosing colorectal cancer. In Section 3, the study's materials and methods are enumerated and explained in depth. This comprises the dataset used, methodologies for feature extraction, and training and testing procedures for the proposed CADx system. The results of the experiments, including performance metrics, are presented and discussed in Section 4. Section 5 concludes the research by summarising the main findings, discussing the study's limitations, and proposing future research directions in this area.

Literature Survey
Alba Nogueira-Rodriguez et al. [15], who identified and categorised polyps using DL techniques, extensively discussed polyps as significant precursors to colorectal cancer. As inputs for DL-based systems, they examined colonoscopy datasets for public and commercial use. In addition, the authors examined the numerous DL technologies currently used in cancer diagnosis. Specifically, they investigated the DL-based CADx system used to detect malignancy on computers. Precision, recall accuracy, and F1 score are all valuable metrics for evaluating the classification performance of a DL-based system. The authors also discussed the difficulty of accurately identifying lesions smaller than 5 mm and how DL techniques may be able to assist with this issue. Although this study employs DL to identify cancer lesions, the authors do not provide a streamlined method.
Hwang, Y.K. et al. [16] proposed a method that employs convolutional neural networks (CNNs) to automatically identify and classify colorectal lesions. This study utilised a compilation of colonoscopy images obtained from the Tada Tomohiro Institute of Gastroenterology and Proctology in Japan. The compilation included 16,418 images from 12,895 patients between December 2013 and March 2017. After being pre-processed, the enhanced images were used in the system's training and testing phases. For polyp detection and classification, the system utilised the Single Shot Multibox Detector (SSD), a neural network with 16 convolutional layers. Caffe, a framework for deep learning, was used to construct and test CNN. White-light imaging facilitated an 83% detection rate for colorectal lesions and a 97% detection rate for adenomas. However, pre-trained or isolated datasets were not used to evaluate the effectiveness of the system in this study.
Omneya Attallah et al. [17] introduced gastro-CADx for the first time, which is a deep learning-based method for categorising numerous GI disorders. This technique is composed of three stages. Initially, spatial information is extracted using four distinct CNNs as feature extractors. In the second phase, the temporal-frequency and spatial-frequency characteristics are used as inputs for the discrete wave transform (DWT) and discrete cosine transform (DCT), respectively. In the final phase, the optimal feature set is determined by combining multiple feature sets and analysing their influence on the CADx output results. Dataset I is the Kvasir dataset, while Dataset II is the Hyper Kvasir dataset. The concept of semantic segmentation for the localisation and detection of malignant polyps has not been clarified, nor has the method been validated with additional datasets.
Muthu Subhash Kavitha et al. [18] highlighted convolutional neural networks (CNN) as a potential solution for datasets consisting of colonoscopy images in their discussion of deep learning approaches for the early diagnosis of colorectal cancer. The researchers compared numerous techniques for identifying and localising lesions in the colorectal region, such as CNN transfer learning, end-to-end learning, hybrid learning, and explainable AI. However, the authors discussed only the theoretical aspects of these methodologies and made no reference to experimental or actual work.
Zheng Cao et al. [19] proposed using Raman spectra and a method of deep learning to identify colorectal cancer cases. The authors obtained Raman spectroscopy data from 26 patients with colorectal cancer. The Raman displacements of the patients ranged from 385 cm −1 to 1545 cm −1 . These datasets were submitted to a 1D-ResNet CNN to generate classifications. The investigation revealed a detection success rate of 98.5% for colorectal cancer. One CNN model was employed for cancer identification, but it was not evaluated on other image datasets. It is also believed that Raman spectroscopy of the colorectal region is a time-consuming and laborious process.
Mahmoud Ragab et al. [14] provided the optimal deep transfer learning strategy for the early detection of colorectal cancer. This study detected colorectal cancer using an algorithm inspired by slime mould (SMADTL-CCDC) and deep transfer learning. The primary objective of the SMADTL-CCDC strategy is to detect colorectal cancer at an early stage. This study employed the dense-EfficientNet method to generate feature vectors from the pre-processed images. We were able to identify and classify cases of colorectal cancer using SMADTL-CCDC and the Discrete Hopfield Neural Network (DHNN) method. The SMADTL-CDC model performed better than more recent methods. For testing, fewer histopathological datasets were utilised than for training; the ratio was 90:10. Consequently, the concept's utility may be limited.
Saban Ozturik et al. [20] investigated the efficacy of artificial intelligence methods in producing an accurate colon cancer diagnosis. In particular, the study compared the efficacy of numerous CNN models on a small collection (2000-6000 images) of colonoscopy image datasets. Each CNN pooling layer from models such as AlexNet, GoogleNet, and ResNet50 was converted into an LSTM layer for use in the overall classification procedure. There were only three CNN models examined, and a large reference dataset for colonoscopy was not utilised.
Using a deep learning-based polyp detector, Meryem Souaidi et al. [21] hypothesised an anomaly in the polyp area of WCE and colonoscopy localisation and visualisation. The authors recommend an MP-FSSD polyp identification model constructed on VGG-16 backbones. However, only convolutional neural network (CNN) models such as VGG-16 are employed, and neither their benefits nor drawbacks are discussed in this study. In addition, the authors analysed only the WCE and CVC clinic database sets for polyp detection.
Nguyen Thanh Duc et al. [22] proposed a novel deep learning technique for detecting lesions in the colon. To aid in the detection of lesions in colonoscopy images, scientists have developed "ColonFormer", a deep learning architecture based on the concept of semantic segmentation. In the proposed encoder-decoder architecture, a lightweight encoder modelling global semantic connections across scales is combined with a hierarchical decoder to enhance feature representations. Using five distinct reference datasets, the authors evaluated the efficacy of the proposed method. This model incorporates the advantages of a transformer and a CNN to generate accurate multiscale representations of features. It can, however, only analyse data from five distinct datasets and a single architecture. The findings of the breakthrough comparison analysis strongly suggest that this strategy has produced the best results.
Saito et al. [23] used anatomical images from robotic colonoscopies in a study that relied on classification by a deep convolutional neural network. The most prevalent locations for colon cancer are the terminal ileum, cecum, ascending colon, transverse colon, descending colon, sigmoid colon, rectum, and anus. To identify potentially malignant cells in the colon, scientists developed a CNN-classified CAD system. Multiple sets of colonoscopy images, up to and including 9995, were used to evaluate the capability of the CAD system to provide exhaustive findings for the entire colon. This study utilised real-time data from colonoscopies performed from January to October 2017. GoogleNet, CNN's primary model, has an aggregate accuracy of 91.7% across 507 images. There are additional CNN models, but they were not examined in this study. Moreover, there is a substantial underutilisation of the numerous datasets that are accessible for this type of research.

Materials and Methods
This study's primary objective is to develop a trustworthy system for the automated and human-reviewed detection and classification of colorectal cancer. The research suggests developing a CADx (Computer-Aided Diagnosis) [24] system for this purpose, which is EnsemDeepCADx. The proposed method analyses and interprets colorectal medical images using cutting-edge technology to expedite the diagnostic process. Figure 1 is a visual representation of the block diagram of the proposed system, which details the diagnostic Bioengineering 2023, 10, 738 6 of 36 procedure's different stages. The EnsemDeepCADx system seeks to provide accurate and rapid colorectal cancer detection by combining automated analysis with human knowledge, which may contribute to enhanced patient outcomes and disease management.
Bioengineering 2023, 10, 738 6 of 39 a visual representation of the block diagram of the proposed system, which details the diagnostic procedure's different stages. The EnsemDeepCADx system seeks to provide accurate and rapid colorectal cancer detection by combining automated analysis with human knowledge, which may contribute to enhanced patient outcomes and disease management.

Colonoscopy Medical Motion Images
During a colonoscopy, medical images are captured as the colonoscope is passed through each section of the intestine. To begin the procedure, the colonoscope is inserted into the rectum and cecum on the right side of the lower abdomen. The colonoscope is inserted through the right side of the abdomen and travels through the ascending colon, transverse colon, and descending colon. The colonoscope is used to access the sigmoid colon, which connects the lower left abdomen to the upper right. The connected camera captures high-resolution images as the colonoscope passes through each segment, enabling clinicians to carefully examine the colon's walls for irregularities or diseases, such as polyps or fibroid tumours. These images are essential for the precise diagnosis and staging of colorectal cancer because they enable clinicians to identify and monitor the progression of abnormal tissue in the colon. In conclusion, the colonoscopy procedure stores medical images of the entire length of the colon, allowing doctors to detect and diagnose problems in this vital digestive organ.

Datasets
The colonoscopy medical image collections are accessible to anyone with internet access. Various online resources offer the CVC Clinic DB [25], Kvasir2 [26,27], and Hyper Kvasir [28,29] datasets, among others, for free download. There may be anywhere from two to twenty-three subclasses labelled for the lower and upper gastrointestinal tracts, depending on the extent of the dataset. CVC Clinic DB contains a total of 1640 images, equitably divided between polyps and non-polyps. Eight distinct classes are represented in the 8000 images comprising the Kvasir2 dataset. The Hyper Kvasir collection contains 10,672 images divided into 23 subclasses that represent the lower and upper digestive systems.
Classes are further organised and labelled into 24 classifications, resulting in a new mixed dataset named CKHK-22 that combines the previously mentioned datasets. The

Colonoscopy Medical Motion Images
During a colonoscopy, medical images are captured as the colonoscope is passed through each section of the intestine. To begin the procedure, the colonoscope is inserted into the rectum and cecum on the right side of the lower abdomen. The colonoscope is inserted through the right side of the abdomen and travels through the ascending colon, transverse colon, and descending colon. The colonoscope is used to access the sigmoid colon, which connects the lower left abdomen to the upper right. The connected camera captures high-resolution images as the colonoscope passes through each segment, enabling clinicians to carefully examine the colon's walls for irregularities or diseases, such as polyps or fibroid tumours. These images are essential for the precise diagnosis and staging of colorectal cancer because they enable clinicians to identify and monitor the progression of abnormal tissue in the colon. In conclusion, the colonoscopy procedure stores medical images of the entire length of the colon, allowing doctors to detect and diagnose problems in this vital digestive organ.

Datasets
The colonoscopy medical image collections are accessible to anyone with internet access. Various online resources offer the CVC Clinic DB [25], Kvasir2 [26,27], and Hyper Kvasir [28,29] datasets, among others, for free download. There may be anywhere from two to twenty-three subclasses labelled for the lower and upper gastrointestinal tracts, depending on the extent of the dataset. CVC Clinic DB contains a total of 1640 images, equitably divided between polyps and non-polyps. Eight distinct classes are represented in the 8000 images comprising the Kvasir2 dataset. The Hyper Kvasir collection contains 10,672 images divided into 23 subclasses that represent the lower and upper digestive systems.
Classes are further organised and labelled into 24 classifications, resulting in a new mixed dataset named CKHK-22 that combines the previously mentioned datasets. The diverse collection of data contains 19,621 images. However, fourteen of these classifications are extremely unbalanced, which may hinder the performance of image classification techniques. Using the 24-class mixed CKHK-22 datasets, they discovered that the large bias towards these 14 classes resulted in a decline in accuracy and severe misclassifica-tion. By disregarding the 14 problematic classes, the authors of this study were able to improve the accuracy and efficacy of the system by concentrating on the most stable classes and images.
To surmount the limitations of the mixed dataset and enhance the performance and accuracy of image classification, the authors conducted experiments with the 10 most consistent classes from the dataset. The image collection used in this classification experiment is composed of 14,287 images equitably distributed across 10 balanced classes based on the colon component type. The distribution of the heterogeneous CKHK datasets is shown in Table 1. This approach increased the system's adaptability to user requirements. The CKHK-22 mixed dataset consists of 14,287 images from 10 distinct classes (bbps-0-1, bbps-2-3, cecum, dyed-lifted-polyps, dyed-resection-margins, non-polyps, polyps, pylorus, retroflex-stomach, and z-line). In terms of total images, class sizes vary from 653 for bbps-0-1 to 2150 for pylorus. The collection contains images from a variety of endoscopic techniques, including magnified narrow-band imaging (NBI) endoscopy, white light endoscopy (WLE), and chromoendoscopy. The dataset was constructed to facilitate the development of an EnsemDeepCADx system for image classification of colorectal cancer. Figure 2 depicts sample images from the CKHK-22 mixed dataset, which consists of 14,287 images in 10 classes.

Image Pre-Processing
Image pre-processing is crucial for improving image quality and classification model efficacy in colonoscopy images [30]. To facilitate the processing of images by deep learning models, it is customary to resize them to a uniform resolution of 224 by 224 pixels as part of the initial processing phase. Noise reduction is performed to prevent models from misclassifying data or omitting features. This includes removing imperfections such as grain, specks, and distortions. By colour correction, images' colour balance can be made more consistent, which aids in classification. Normalisation helps standardise the pixel intensity values across the entire dataset, which is crucial for the efficient operation of various algorithms. Together, these pre-processing techniques optimise the classification quality of colonoscopy images.

Image Pre-Processing
Image pre-processing is crucial for improving image quality and classification model efficacy in colonoscopy images [30]. To facilitate the processing of images by deep learning models, it is customary to resize them to a uniform resolution of 224 by 224 pixels as part of the initial processing phase. Noise reduction is performed to prevent models from misclassifying data or omitting features. This includes removing imperfections such as grain, specks, and distortions. By colour correction, images' colour balance can be made more consistent, which aids in classification. Normalisation helps standardise the pixel intensity values across the entire dataset, which is crucial for the efficient operation of

Google Cloud
The procedures for uploading the CKHK-22 dataset of colonoscopy medical images to Google Cloud for use with Co-lab [31]  This article describes how to upload the CKHK-22 dataset of colonoscopy medical motion images for analysis and processing using Google Cloud and Co-lab.

Image Augmentation
Image augmentation is a method for artificially increasing the size of a dataset by synthesising new images from the originals using a variety of techniques [32]. In the context of the colonoscopy dataset, image enhancement involves altering the relative proportions of: These modifications enhance the aesthetic allure of the images and reduce the likelihood of over fitting during training. The augmented images are then added to the original dataset, expanding and diversifying it, which may improve the model's precision and resilience. Using the enhanced dataset, the deep learning model is then trained on a platform such as Google Co-lab.

Train Test Split
The CKHK-22 dataset contains 14,287 images and 10 classes, as previously described. To facilitate model training and evaluation, the dataset is divided into a training set and a test set with a 70:30 division [33]. This indicates that 10,001 images (or 70% of the data) are used to train the model, whereas 4286 images (or 30% of the data) are reserved for evaluating the model on new data. The training set is used to teach the convolutional neural network to recognise patterns and make accurate predictions. To evaluate the generalisability of a model, its efficacy on new data (the test set) is evaluated. This technique prevents the model from being over fit to its training data, which could lead to substandard results when applied to novel data. By separating the dataset into training and test sets, the efficacy of convolutional neural networks for image classification can be evaluated more precisely on new data, allowing for a more accurate evaluation of the model's performance.

Extracting Grey Scale Features from Original Image Dataset
As a typical method in image processing, converting colour images to greyscale removes colour differences that may not be pertinent to the work at hand and compresses the data [34]. Greyscale conversion of the colour images in the CKHK-22 dataset may facilitate the classification process by eliminating colour-based information that is superfluous to the objective at concern. Greyscale images are produced by linearly combining the red, green, and blue (RGB) channels of the original colour image. In a greyscale image, the intensity values of individual pixels are calculated as follows: where, here, we symbolise the greyscale intensity and R, G, and B represent the original colour image's red, green, and blue values. The brightness levels of the three colours are used to calculate these coefficients, with the red channel carrying the least weight and the green channel carrying the greatest. The CKHK-22 colour images may be converted to greyscale by following these steps: • Use a library for processing images, such as OpenCV, to read the source colour images; • Apply the following formula to transform every image from the RGB colour space into the greyscale colour space; • Create a new folder to store the greyscale images in so they may be used in the image classification model's testing and training stages.
Since greyscale images only have one channel as opposed to colour images' three, this transformation simplifies the dataset and speeds up the training process. This may also lessen the likelihood of the model being overfit to colour-specific traits that might not transfer well to other images. The pattern recognition and classification accuracy of convolutional neural networks (CNNs) can be enhanced by extracting greyscale image features from the CKHK-22 mixed dataset. By applying convolutional filters to the greyscale images, CNN can recognise edges, lines, and forms. The image is classified into one of ten classes after combining and supplying these features to entirely linked layers. By extracting greyscale image features, the model's overall efficacy and performance on image classification tasks may be improved. The greyscale images are depicted in Figure 3. The conversion of the original colour images was used to generate the other type of CKHK-22 datasets.

Extracting Local Binary Pattern Features from Greyscale Dataset
Local binary pattern (LBP) features are extracted from greyscale images to generate the LBP CKHK-22 dataset. The texture descriptor is then applied to the greyscale image, and a binary pattern is computed for each pixel based on the values of the adjacent pixels. The LBP operator establishes the pattern by comparing the central pixel's intensity to that of its immediate companions. The value 1 is allocated to neighbouring pixels whose intensities are greater than or equal to the central pixel, while the value 0 is assigned other-

Extracting Local Binary Pattern Features from Greyscale Dataset
Local binary pattern (LBP) features are extracted from greyscale images to generate the LBP CKHK-22 dataset. The texture descriptor is then applied to the greyscale image, and a binary pattern is computed for each pixel based on the values of the adjacent pixels. The LBP operator establishes the pattern by comparing the central pixel's intensity to that of its immediate companions. The value 1 is allocated to neighbouring pixels whose intensities are greater than or equal to the central pixel, while the value 0 is assigned otherwise. A pixel's texture pattern is depicted by the decimal value obtained by translating its binary pattern [35]. The LBP features are extracted by sliding a 3 × 3 window across the greyscale CKHK-22 images and computing the LBP pattern for each pixel in the window. This procedure is repeated for each pixel, resulting in a new image in which each pixel has an LBP value. A histogram is then generated for each of the 16 × 16 non-overlapping segments that comprise the LBP image. The histograms constitute the LBP feature vector of the image.
The procedures for converting greyscale Images into LBP features are as follows: • Create greyscale images that are typically 3 × 3 or 5 × 5 pixels in size; • Determine the LBP of each pixel in the region by comparing its intensity to that of its neighbouring pixels; • Replace the pixel's original intensity value with a binary code representing the pattern of intensity differences between the central pixel and its companions; • The LBP values of each pixel in the sub-region are added together to generate a singular LBP code; • To generate a full set of LBP codes, the procedure must be repeated for each image's subregion; • To generate a feature vector that adequately characterises the image, the LBP codes are aggregated across the entire image using a histogram-based technique.
The resulting LBP CKHK-22 dataset contains ten classes and 14,287 greyscale images annotated with LBP features, 70:30 divided between a training set and a test set. These LBP features capture the texture information present in colonoscopy images, which is necessary for accurate image classification. The greyscale images are depicted in Figure 4. The conversion of the greyscale images was used to generate the other type of CKHK-22 datasets.

Feature Fusion as New Dataset
The EnsemDeepCADx system for detecting colorectal cancer employs a multi-stage strategy that incorporates various features to effectively detect and identify polyps, with feature integration constituting an integral part of the system [36]. To extract features from the CKHK-22 dataset, they employ local binary patterns (LBPs), greyscale images, and raw RGB images. The result of combining these three datasets is the feature fusion dataset. Combining the full-size dataset with the greyscale and LBP datasets along the feature dimension yields a new dataset with three times as many images as the full-size dataset. The combined feature fusion dataset contains 42,861 images from the same 10 classes, with each of the three datasets contributing 14,287 images.
The derived features from the three datasets are combined into a singular feature vector during the fusion procedure. Combining several feature vectors by concatenating them along the feature dimension is one technique for fusing features [37]. Considering an input image size of 224 × 224, the original RGB dataset would consist of 3 × 224 × 224 = 150,528 features per image. Applying the same formula to the greyscale and LBP datasets would result in 50,176 and 2,650,816 features per image, respectively. Therefore, the final feature vector for each image in the feature fusion dataset would contain a total of 3 × 224 × 224 + 224 × 224 × 59 = 2,827,008 features.

Feature Fusion as New Dataset
The EnsemDeepCADx system for detecting colorectal cancer employs a multi-stage strategy that incorporates various features to effectively detect and identify polyps, with feature integration constituting an integral part of the system [36]. To extract features from the CKHK-22 dataset, they employ local binary patterns (LBPs), greyscale images, and raw RGB images. The result of combining these three datasets is the feature fusion dataset. Combining the full-size dataset with the greyscale and LBP datasets along the feature dimension yields a new dataset with three times as many images as the full-size dataset. The EnsemDeepCADx system relies significantly on the process of feature fusion, which unifies disparate data sources into a single representation. After feature fusion, the final dataset is typically divided 70:30 between training and testing datasets. The mere size of datasets such as ImageNet necessitates transfer learning at this time so that STM and SVM classifiers are then employed to classify the CNN-obtained features. CNNs can utilise the features learned by ensemble fusion CNN models. BiLSTM and SVM classifiers are then employed to classify the CNN-obtained features. Ensemble fusion CNNs trained with BiLSTM and SVM classifiers via transfer learning may enhance disease detection and diagnosis. All features extracted from the original CKHK-22 colour images, including grey and LBP images, are merged to create a new dataset termed the feature fusion dataset, as depicted in Figure 5.

Image Classification Using Ensemble Fusion CNNs
When it comes to image classification in EnsemDeepCADx, ensemble fusion CNNs are an excellent method for increasing the accuracy of predictions. This method produces an enhanced model by integrating multiple convolutional neural networks (CNNs) [38]. The aDaDR-22, aDaR-22, and DaRD-22 models represent a handful of instances of ensemble fusion CNNs applied to the CKHK-22 dataset.
The ADaDR-22 model consists of four previously trained networks: AlexNet, Dark-Net-19, DenseNet-201, and ResNet-50. AlexNet is renowned for its accuracy and speed when processing complex data, while DarkNet-19 is admired for its speed and

Image Classification Using Ensemble Fusion CNNs
When it comes to image classification in EnsemDeepCADx, ensemble fusion CNNs are an excellent method for increasing the accuracy of predictions. This method produces an enhanced model by integrating multiple convolutional neural networks (CNNs) [38]. The aDaDR-22, aDaR-22, and DaRD-22 models represent a handful of instances of ensemble fusion CNNs applied to the CKHK-22 dataset.
The ADaDR-22 model consists of four previously trained networks: AlexNet, DarkNet-19, DenseNet-201, and ResNet-50. AlexNet is renowned for its accuracy and speed when processing complex data, while DarkNet-19 is admired for its speed and performance. DenseNet-201 is optimised for processing intricate features and correlations, whereas ResNet-50 is suitable for deep learning due to its skip connections. The ADaDR-22 model combines the most beneficial features of both varieties to achieve superior performance. aDaR-22 incorporates AlexNet, DarkNet-19, and ResNet-50, three models that were previ-ously trained. This model resembles aDaDR-22, with the exclusion of DenseNet-201. This model's objective is to simplify it while maintaining a high level of precision. DarkNet-19, ResNet-50, and DenseNet-201 are the DaRD-22 model's construction elements. This model performs extremely well for complex data with associated properties. The DaRD-22 model integrates these three models to perform a wide range of image classification tasks efficiently and accurately.
During ensemble fusion, multiple CNN models are combined into a single, larger model. In order to diagnose colorectal cancer, the CADx system analyses and combines the most effective features from four pre-trained models into a single robust model. This method generates more precise and reliable forecasts than a single model could. Figures 6-9 show the four pre-trained CNN models. AlexNet is composed of three fully connected layers and five convolutional layers. Rectified Linear Units (ReLU), a nonlinear activation function that accelerates training, are utilised. With the aid of local response normalisation and maximum aggregation, overfitting is minimised. AlexNet [39] was trained with 1.2 million images from 1000 classes, and it has a total of 60 million parameters. Multiple computer vision initiatives continue to employ this architecture. DarkNet-19 is a neural network architecture specifically designed for object recognition. It is a condensed version of the DarkNet-53 architecture found in the popular YOLOv3 object detector. DarkNet-19 [40] is composed of 19 convolutional layers based on the YOLOv2 architecture. To accomplish this, hybrid architecture composed of convolutional and max pooling layers is utilised. To facilitate gradient propagation and prevent gradients from dissipating, the network incorporates shortcut connections. DarkNet-19 employs batch normalisation and leaky ReLU activation functions to improve the training process and reduce overfitting. Compared to more complex models, DarkNet-19 uses fewer computational resources while maintaining high object identification accuracy. DenseNet-201 [41,42], upon which it is constructed, is praised for the dense interconnections that define its architecture. DenseNet-201 transmits feature maps downward through the layers, with each successively higher layer receiving the feature maps from all lower layers. By increasing gradient flow and utilising features, dense connections improve accuracy with fewer parameters. The dense structural elements consist of interconnected levels and systems. To reduce the spatial dimensions of the feature maps, they are divided into dense units and connected by transition layers such as a convolutional layer, a pooling layer, and a batch normalisation layer. After a global average pooling layer and a fully connected layer with a SoftMax activation function, the final layer of the network outputs the predicted class probabilities. ResNet50 [43,44] is a neural network with 50 convolutional, pooling, and fully connected layers. Residual connections enable the training of more complex networks and the learning of residual functions. During ensemble fusion, multiple CNN models are combined into a single, larger model. In order to diagnose colorectal cancer, the CADx system analyses and combines the most effective features from four pre-trained models into a single robust model. This method generates more precise and reliable forecasts than a single model could. Figures  6-9 show the four pre-trained CNN models. AlexNet is composed of three fully connected layers and five convolutional layers. Rectified Linear Units (ReLU), a nonlinear activation function that accelerates training, are utilised. With the aid of local response normalisation and maximum aggregation, overfitting is minimised. AlexNet [39] was trained with 1.2 million images from 1000 classes, and it has a total of 60 million parameters. Multiple computer vision initiatives continue to employ this architecture. DarkNet-19 is a neural network architecture specifically designed for object recognition. It is a condensed version of the DarkNet-53 architecture found in the popular YOLOv3 object detector. DarkNet-19 [40] is composed of 19 convolutional layers based on the YOLOv2 architecture. To accomplish this, hybrid architecture composed of convolutional and max pooling layers is utilised. To facilitate gradient propagation and prevent gradients from dissipating, the network incorporates shortcut connections. DarkNet-19 employs batch normalisation and leaky ReLU activation functions to improve the training process and reduce overfitting. Compared to more complex models, DarkNet-19 uses fewer computational resources while maintaining high object identification accuracy. DenseNet-201 [41,42], upon which it is constructed, is praised for the dense interconnections that define its architecture. DenseNet-201 transmits feature maps downward through the layers, with each successively higher layer receiving the feature maps from all lower layers. By increasing gradient flow and utilising features, dense connections improve accuracy with fewer parameters. The dense structural elements consist of interconnected levels and systems. To reduce the spatial dimensions of the feature maps, they are divided into dense units and connected by transition layers such as a convolutional layer, a pooling layer, and a batch normalisation layer. After a global average pooling layer and a fully connected layer with a SoftMax activation function, the final layer of the network outputs the predicted class probabilities. ResNet50 [43,44] is a neural network with 50 convolutional, pooling, and fully connected layers. Residual connections enable the training of more complex networks and the learning of residual functions.      The purpose of this study is to integrate the beneficial features of multiple deep learning models. Specifically, the objective is to construct a model capable of extracting highlevel features comparable to AlexNet, while employing the efficiency and lightweight design of DarkNet-19. In addition, the model intends to resolve the issue of DenseNet-201′s vanishing gradients and effectively manage the complexity of deep networks such as Res-Net-50. The enhanced accuracy and performance of the resulting model are the result of its capacity to extract multiple features from complex datasets. There are numerous advantages to utilising multiple CNNs as opposed to one. Ensemble fusion CNNs uses the most accurate aspects of multiple models to enhance accuracy. Besides being more adaptable, these varieties are frequently more secure. Through ensemble fusion CNNs, overfitting, which occurs when a model becomes overly specific to the training data and underperforms on new data, can also be avoided. When applied to large, complex datasets such as CKHK-22, ensemble fusion CNNs such as aDaDR-22, aDaR-22, and DaRD-22 may significantly improve the accuracy of image classification. The detailed parameters of the ensemble fusion CNNs are described in Table 2.     The purpose of this study is to integrate the beneficial features of multiple deep learning models. Specifically, the objective is to construct a model capable of extracting highlevel features comparable to AlexNet, while employing the efficiency and lightweight design of DarkNet-19. In addition, the model intends to resolve the issue of DenseNet-201′s vanishing gradients and effectively manage the complexity of deep networks such as Res-Net-50. The enhanced accuracy and performance of the resulting model are the result of its capacity to extract multiple features from complex datasets. There are numerous advantages to utilising multiple CNNs as opposed to one. Ensemble fusion CNNs uses the most accurate aspects of multiple models to enhance accuracy. Besides being more adaptable, these varieties are frequently more secure. Through ensemble fusion CNNs, overfitting, which occurs when a model becomes overly specific to the training data and underperforms on new data, can also be avoided. When applied to large, complex datasets such as CKHK-22, ensemble fusion CNNs such as aDaDR-22, aDaR-22, and DaRD-22 may significantly improve the accuracy of image classification. The detailed parameters of the ensemble fusion CNNs are described in Table 2.  The purpose of this study is to integrate the beneficial features of multiple deep learning models. Specifically, the objective is to construct a model capable of extracting high-level features comparable to AlexNet, while employing the efficiency and lightweight design of DarkNet-19. In addition, the model intends to resolve the issue of DenseNet-201's vanishing gradients and effectively manage the complexity of deep networks such as ResNet-50. The enhanced accuracy and performance of the resulting model are the result of its capacity to extract multiple features from complex datasets. There are numerous advantages to utilising multiple CNNs as opposed to one. Ensemble fusion CNNs uses the most accurate aspects of multiple models to enhance accuracy. Besides being more adaptable, these varieties are frequently more secure. Through ensemble fusion CNNs, overfitting, which occurs when a model becomes overly specific to the training data and underperforms on new data, can also be avoided. When applied to large, complex datasets such as CKHK-22, ensemble fusion CNNs such as aDaDR-22, aDaR-22, and DaRD-22 may significantly improve the accuracy of image classification. The detailed parameters of the ensemble fusion CNNs are described in Table 2. Recurrent neural networks (RNNs) such as the Bidirectional Long Short-Term Memory (BILSTM) can process data sequences that contain both past and future information. BILSTMs are bidirectional RNNs, which means they can process information in both directions [45]. This makes the network a useful tool for predicting data sequences, as it can store both historical and future data. A BILSTM is constructed using two LSTM networks, one for processing the sequence forward and the other for processing the sequence reverse. The final result is the combined output of the two LSTMs [46]. This architecture is especially advantageous for applications requiring voice recognition, natural language processing, and image captioning, as it permits the network to acquire both immediate and deferred dependencies in the sequence. BILSTMs are utilised in the creation of image captions. The network is trained on a vast corpus of image-caption pairings for it to comprehend the underlying patterns and correlations between the visual elements of the images and the text description. Once the network has been trained, it can respond to any given image with a caption. The BiLSTM architecture is shown in Figure 10.
mation. BILSTMs are bidirectional RNNs, which means they can process information in both directions [45]. This makes the network a useful tool for predicting data sequences, as it can store both historical and future data. A BILSTM is constructed using two LSTM networks, one for processing the sequence forward and the other for processing the sequence reverse. The final result is the combined output of the two LSTMs [46]. This architecture is especially advantageous for applications requiring voice recognition, natural language processing, and image captioning, as it permits the network to acquire both immediate and deferred dependencies in the sequence. BILSTMs are utilised in the creation of image captions. The network is trained on a vast corpus of image-caption pairings for it to comprehend the underlying patterns and correlations between the visual elements of the images and the text description. Once the network has been trained, it can respond to any given image with a caption. The BiLSTM architecture is shown in Figure 10. In the EnsemDeepCADx system, BILSTM may be utilised as an additional model to the ensemble fusion CNNs in order to improve classification accuracy. This is known as transfer learning. The BILSTM is pre-trained on a large dataset, such as ImageNet, to acquire generic image features. To further aid in the classification of colonoscopy images, these learned features may be transferred to the CADx system. Using the output features from the ensemble fusion CNNs, the BILSTM can discover temporal connections between image features. Due to the sequential nature of colonoscopy recordings, temporal data may be beneficial for identifying the images. The BILSTM may be beneficial for detecting early warning signs of cancer because it can record changes in image features over multiple time periods. In the EnsemDeepCADx system, BILSTM may be utilised as an additional model to the ensemble fusion CNNs in order to improve classification accuracy. This is known as transfer learning. The BILSTM is pre-trained on a large dataset, such as ImageNet, to acquire generic image features. To further aid in the classification of colonoscopy images, these learned features may be transferred to the CADx system. Using the output features from the ensemble fusion CNNs, the BILSTM can discover temporal connections between image features. Due to the sequential nature of colonoscopy recordings, temporal data may be beneficial for identifying the images. The BILSTM may be beneficial for detecting early warning signs of cancer because it can record changes in image features over multiple time periods.

Support Vector Machines
Support vector machines (SVMs), a type of supervised machine learning technique, can be utilised to perform classification and regression analysis [47]. SVMs can be used for classification and regression, which can both be linear or non-linear. SVMs aim to identify the optimal hyperplane for classifying a dataset into its constituent classes. SVMs employ n-dimensional (as many as the number of features in the dataset) plots of data points. The method then identifies the hyperplane that partitions the data into classes. Support vectors are the closest data points to the hyperplane, hence the name support vector machines.
Transfer learning can be used in tandem with multi-class SVMs by employing previously trained models as feature extractors [48]. The SVM algorithm may use as an input the features derived from the pre-trained models to classify data. Using pre-trained models to extract useful features can be advantageous when working with small datasets, as it eliminates the need for extensive training data. Transfer learning is utilised in conjunction with ensemble fusion CNNs and BILSTM in the EnsemDeepCADx system's training on the CKHK-22 mixed dataset to extract features from the dataset. Finally, the generated feature vectors are fed to a multi-class SVM classifier for classification. This method permits the incorporation of temporal information into the feature vectors using BILSTM and the use of multiple pre-trained models to extract supplementary features. This method may improve the overall accuracy and efficacy of image classification for colorectal cancer.

The Classification Step
The procedure of the entire EnsemDeepCADx system classification process is described as follows: • The SVMs are a family of machine learning algorithms used for classification and regression testing. SVMs are utilised as a post-processing stage in the EnsemDeepCADx system following the acquisition of classification results from an ensemble of CNNs and a BILSTM. An SVM classifier receives the results from the BILSTM layer and transforms the features into a higher dimension using a kernel function. The SVM classifier searches for the hyperplane that divides the input features into distinct categories in order to classify them. This technique helps improve the classification model's accuracy by reducing false positives and enhancing class separation. This discovers how to partition data into the ten classes provided as input; • Before performing transfer learning using ensemble CNNs, the final fully connected layer and SoftMax activation layer were eliminated from each CNN model. The feature maps produced by the final convolutional layer of each CNN were then provided to the BiLSTM layer. A total of 64 hidden units within the BiLSTM layer employed the tanh activation function. The output of the BiLSTM layer was input into a fully connected layer consisting of 32 hidden units and the ReLU activation function after final classification using a multi-Class SVM; • Training the models: in this EnsemDeepCADx system, it can train the models using the pre-processed datasets and the hyperparameters specified for each model. The EnsemDeepCADx system can use techniques such as early stopping and learning rate scheduling to optimise the training process; • Evaluating the models:Ie EnsemDeepCADx system can evaluate the models on the test set using metrics such as accuracy, precision, recall, and F1 score. In this system, the efficacy of a model is evaluated by producing ROC curves and calculating the area under the curve (AUC). To evaluate the efficacy of the trained models in real-world scenarios, an independent set of images from the CKHK-22 mixed dataset is used. • Future images of colorectal cancer can be identified accurately using the completed EnsemDeepCADx system. Figure 11 depicts the classification architecture, which integrates ensemble fusion CNNs with BiLSTM and multi-class SVM. area under the curve (AUC). To evaluate the efficacy of the trained models in realworld scenarios, an independent set of images from the CKHK-22 mixed dataset is used.

•
Future images of colorectal cancer can be identified accurately using the completed EnsemDeepCADx system. Figure 11 depicts the classification architecture, which integrates ensemble fusion CNNs with BiLSTM and multi-class SVM.  The performance metrics obtained at each stage are then compared to determine which ensemble fusion CNN with BiLSTM and multi-class SVM provides the best recognition of colorectal cancer. The flow of the four stages is explained in detail in Figure 12.
The EnsemDeepCADx system developed in this study excels due to its novel combination of DaRD-22 ensemble fusion CNNs, bidirectional long short-term memory (BiL-STM), and support vector machines (SVM). This multimodal approach enables comprehensive analysis and diagnosis of colorectal cancer. The CKHK mixed dataset contains 14,287 images from nine distinct classes. We recognise the value of benchmarking against previous AI imaging methods, but it is essential to note that the primary objective of our research was to propose a novel method that combines ensemble learning and multimodal methodologies. Our research seeks to ascertain whether or not this combination improves the accuracy of colorectal cancer detection. of the four feature datasets. Stage 4: The three trained ensemble fusion CNN models are combined with BiLSTM and multi-class SVM models through transfer learning. The resulting models are then trained and tested with each of the four feature datasets.
The performance metrics obtained at each stage are then compared to determine which ensemble fusion CNN with BiLSTM and multi-class SVM provides the best recognition of colorectal cancer. The flow of the four stages is explained in detail in Figure 12. The EnsemDeepCADx system developed in this study excels due to its novel combination of DaRD-22 ensemble fusion CNNs, bidirectional long short-term memory (BiLSTM), and support vector machines (SVM). This multimodal approach enables comprehensive analysis and diagnosis of colorectal cancer. The CKHK mixed dataset contains 14,287 images from nine distinct classes. We recognise the value of benchmarking against previous AI imaging methods, but it is essential to note that the primary objective of our research was to propose a novel method that combines ensemble learning and multimodal methodologies. Our research seeks to ascertain whether or not this combination improves the accuracy of colorectal cancer detection.

Experimental Setup
This article contains the experimental outcomes of the EnsemDeepCADx system. All testing was conducted on computers equipped with all the required hardware and software. This project's software was chosen based on its compatibility with the system's hardware and the task at hand. The studies utilised these software and hardware configurations because they were determined to be optimal for attaining the desired outcomes. The CADx system was deployed on a Dell Precision Tower T5810 machine outfitted with a 2.20 GHz Intel ® Xeon ® CPU core i7 E5-2630 processor and 32 GB of RAM. The NVIDIA Xp GPU accelerated the system's processing power, making the deep learning models

Experimental Setup
This article contains the experimental outcomes of the EnsemDeepCADx system. All testing was conducted on computers equipped with all the required hardware and software. This project's software was chosen based on its compatibility with the system's hardware and the task at hand. The studies utilised these software and hardware configurations because they were determined to be optimal for attaining the desired outcomes. The CADx system was deployed on a Dell Precision Tower T5810 machine outfitted with a 2.20 GHz Intel ® Xeon ® CPU core i7 E5-2630 processor and 32 GB of RAM. The NVIDIA Xp GPU accelerated the system's processing power, making the deep learning models more accessible. The software stack included Keras and TensorFlow 2.7.0 as deep learning libraries, while the underlying operating system was Google Co-lab Pro+ running on Python 3.7.12. The ensemble fusion CNN models, BiLSTM models, and SVM models all used these libraries for training and testing. The EnsemDeepCADx system system's hardware and software components were hand-picked to meet the demanding processing needs of the deep learning models and provide rapid, accurate classification.
This EnsemDeepCADx system relies heavily on its datasets. There are 10 classes for CKHK-22 in a mixed-dataset representation. Medical colonoscopy motion images are archived and made available for CNN training in all the various data classes. Each of the four feature types-original, grey level, LBP, and the merged dataset of original, grey level, and LBP-had their own training and testing sets inside the CKHK-22 dataset. Each feature type had a training set of 10,000 images and a testing set of 4287. There was a total of 42,861 images in the feature fusion dataset, including 30,000 training images and 12,861 testing images.
All datasets were subjected to transfer learning using BiLSTM and multi-class SVM, and ensemble fusion CNNs were used for experimental research. The hyperparameters were considered while planning the experiments. The batch size was 32, the learning rate was 0.0001, momentum was 0.9, and the number of epochs was 30. The optimiser was Adam. With a dropout rate of 0.5 and a batch size of 32, 30 epochs were used to train the BILSTM model. The multi-class SVM model utilised a linear kernel with a C value of 1.0. However, the batch size was increased to 128 to assure the seamless operation of the system on the 42,861-image feature fusion dataset. These hyperparameters were determined through a combination of empirical testing and comparative project data analysis.
Integrating multiple CNNs may result in the model becoming excessively complex, overlearning, and overfitting. Our research has taken several measures to resolve this issue. We have used regularisation techniques, including the dropout and weight decay, to reduce model complexity and prevent overfitting. These methods have the potential to reduce the danger of overlearning by employing regularisation constraints during training. Early stopping is a technique for terminating training early based on a model's performance on a validation set. In order to prevent overlearning, training is terminated when no additional progress can be made or no negative effects can be observed. The hyperparameters of our model are optimal, establishing a balance between overfitting and underfitting. Changing the learning rate, sample size, and model architecture are a few of the numerous ways to avoid overlearning.

Results
The EnsemDeepCADx system was created to detect colorectal cancer by fusing together several variables from the CKHK-22 mixed dataset. Using the original, greyscale, LBP, and feature fusion datasets, three ensemble CNNs were merged with BiLSTM and multi-class SVM to boost the system's accuracy.
Accuracy, precision, the F1 score, and recall were utilised as performance indicators to evaluate the effectiveness of the system. For each model set, the ROC and confusion curves were also mapped out. These measurements are the result of the following equations: The main resulting experiments were divided into the last three stages of the CADx system, with the findings from each phase compared to find the optimal ensemble for CADx-based colorectal cancer recognition.

Stage 1 Experimentation
After converting the original colour images to greyscale, the LBP transformation was applied to the CKHK-22 mixed dataset to generate LBP-featured images. Next, we fused the original, greyscale, and LBP featured images to create a new feature fusion dataset. All final three stages of the experiments employed these four datasets as inputs: the original, greyscale, LBP, and feature fusion. This is a crucial juncture for the EnsemDeepCADx system system, enabling crucial experiments to be conducted in subsequent stages. The subsequent investigations of the CADx system would not have been possible without the initial construction of the feature fusion dataset, which is the most important dataset for this system.

Stage 2 Experimentation: Ensemble Fusion CNNs
In the second stage of the experiment, the EnsemDeepCADx system employs three ensemble CNN models: ADaDR-22, ADaR-22, and DaRD-22. Each of the four CKHK-22 datasets (original, greyscale, LBP, and feature fusion) is used to train and evaluate these models; they contain a total of 10,000 training and 4287 testing images in three datasets and 30,000 training and 12,861 testing images in the feature fusion dataset, with 10 classes in each dataset.
Each ensemble CNN collects and calculates a variety of performance metrics, including accuracy, precision, F1 score, and recall. The system also contrasts the efficacy of the models and graphically displays the results. This stage of the experiment is crucial because it contrasts the accuracy of each model with different input data by evaluating the performance of each CNN ensemble with a variety of datasets. The results can be used to determine which CNN ensemble performs the best on the CKHK-22 dataset for detecting colorectal cancer. Table 3 presents the results of the Stage 2 experimental investigations of EnsemDeepCADx for ensemble CNNs using four datasets. Using ensemble fusion CNNs, the analysis revealed that DaRD-22 provided the highest level of accuracy in the Stage 2 experiment.
(95.46%) and testing (92.31%) compared to the other models. The ADaDR-22 model's precision, recall, and F1 scores were 92.62%, 90.2%, and 89.5%, respectively. It had a lower training accuracy (92.06%) than the DaRD-22 model but a testing accuracy (90%) that was comparable. The ADaR-22 model performed the worst of the three, with precision, recall, and F1 scores of 91.56%, 89.69%, and 88.43%, respectively. In addition, it had the lowest accuracy during training (91.88%) and testing (89.69%) compared to the other models. Figures 13-16 depict the comprehensive graphical analysis of the results of the second stage. Using ensemble fusion CNNs, the analysis revealed that DaRD-22 provided the highest level of accuracy in the Stage 2 experiment.  training accuracy (92.06%) than the DaRD-22 model but a testing accuracy (90%) that was comparable. The ADaR-22 model performed the worst of the three, with precision, recall, and F1 scores of 91.56%, 89.69%, and 88.43%, respectively. In addition, it had the lowest accuracy during training (91.88%) and testing (89.69%) compared to the other models. Figures 13-16 depict the comprehensive graphical analysis of the results of the second stage. Using ensemble fusion CNNs, the analysis revealed that DaRD-22 provided the highest level of accuracy in the Stage 2 experiment.

Stage 3 Experimentation: Ensemble Fusion CNNs + Multi-Class SVM
In the third stage of the investigation, the EnsemDeepCADx system integrates the

Stage 3 Experimentation: Ensemble Fusion CNNs + Multi-Class SVM
In the third stage of the investigation, the EnsemDeepCADx system integrates the three ensemble CNN models (ADaDR-22, ADaR-22, and DaRD-22) with a multi-class

Stage 3 Experimentation: Ensemble Fusion CNNs + Multi-Class SVM
In the third stage of the investigation, the EnsemDeepCADx system integrates the three ensemble CNN models (ADaDR-22, ADaR-22, and DaRD-22) with a multi-class SVM to improve the colorectal cancer detection accuracy of the CKHK-22 dataset. Input for this stage is the feature fusion dataset, which has demonstrated superior performance in Stage 2.
The SVM functions as the final classifier in the EnsemDeepCADx system, receiving as input the fused features from the ensemble CNN models and producing the output. In Stage 2, the accuracy, precision, F1 score, recall, are computed for the combined ensemble CNN-SVM model, and the results are compared with the individual ensemble CNN models. This crucial stage of the experiment seeks to improve the accuracy of the EnsemDeepCADx system by integrating the strengths of ensemble CNNs and SVM for detecting colorectal cancer in the CKHK-22 dataset. Table 4 presents the results of the Stage 3 experimental investigations of EnsemDeepCADx for ensemble CNNs with multi-class SVM using four datasets. During the Stage 2 section of the experiment, the feature fusion CKHK-22 dataset was used to evaluate and contrast the capabilities of ensemble fusion CNNs in conjunction with multi-class SVM. The percentage of 91.59% was reached by DaRD-22, making it the system with the greatest recall, while the value of 92.63% was attained by DaRD-22, making it the system with the best accuracy. Additionally, DaRD-22 earned the highest F1 score, which was 90.48%. DaRD-22 attained the greatest training accuracy with a value of 90.89%, and it gained the best testing accuracy with a value of 91.59%. Both results were accomplished by  The results indicate that DaRD-22 ensemble fusion CNNs combined with a multi-class SVM can enhance the performance of the models on the CKHK-22 datasets. Depending on the performance metric of concern, the best-performing model differs. The results indicate that DaRD-22 ensemble fusion CNNs combined with a multiclass SVM can enhance the performance of the models on the CKHK-22 datasets. Depending on the performance metric of concern, the best-performing model differs.    The results indicate that DaRD-22 ensemble fusion CNNs combined with a multiclass SVM can enhance the performance of the models on the CKHK-22 datasets. Depending on the performance metric of concern, the best-performing model differs.

Stage 4 Experimentation: Ensemble Fusion CNNs + BiLSTM + Multi-Class SVM
In the final stage of the experiment, BiLSTM and multi-class SVM were merged with all three ensemble CNNs (ADaDR-22, ADaR-22, and DaRD-22). All four of the CKHK-22 featured datasets (original, greyscale, LBP, and feature fusion) were used in the model's training and testing, with metrics including accuracy, precision, F1 score, and recall serving as measures of performance. The success of the EnsemDeepCADx system in identifying colorectal cancer utilising ensemble CNNs, BiLSTM, and multi-class SVM was determined by the outcomes of this stage. The results of this stage provided insight into the optimal mix of these models for boosting the system's precision. Table 5 presents the results of the Stage 4 experimental investigations of EnsemDeepCADx for ensemble CNNs with BiLSTM and multi-class SVM using four datasets.

Stage 4 Experimentation: Ensemble Fusion CNNs + BiLSTM + Multi-Class SVM
In the final stage of the experiment, BiLSTM and multi-class SVM were merged with all three ensemble CNNs (ADaDR-22, ADaR-22, and DaRD-22). All four of the CKHK-22 featured datasets (original, greyscale, LBP, and feature fusion) were used in the model's training and testing, with metrics including accuracy, precision, F1 score, and recall serving as measures of performance. The success of the EnsemDeepCADx system in identifying colorectal cancer utilising ensemble CNNs, BiLSTM, and multi-class SVM was determined by the outcomes of this stage. The results of this stage provided insight into the optimal mix of these models for boosting the system's precision. Table 5 presents the results of the Stage 4 experimental investigations of EnsemDeepCADx for ensemble CNNs with BiLSTM and multi-class SVM using four datasets.

Stage 4 Experimentation: Ensemble Fusion CNNs + BiLSTM + Multi-Class SVM
In the final stage of the experiment, BiLSTM and multi-class SVM were merged with all three ensemble CNNs (ADaDR-22, ADaR-22, and DaRD-22). All four of the CKHK-22 featured datasets (original, greyscale, LBP, and feature fusion) were used in the model's training and testing, with metrics including accuracy, precision, F1 score, and recall serving as measures of performance. The success of the EnsemDeepCADx system in identifying colorectal cancer utilising ensemble CNNs, BiLSTM, and multi-class SVM was determined by the outcomes of this stage. The results of this stage provided insight into the optimal mix of these models for boosting the system's precision. Table 5 presents the results of the Stage 4 experimental investigations of EnsemDeepCADx for ensemble CNNs with BiLSTM and multi-class SVM using four datasets.
In the final phase of the experiment with the original CKHK-22 image datasets, DaRD-22 ensemble fusion CNNs achieved the highest values for precision, recall, F1 score, training accuracy, and testing accuracy, with a precision of 95.31%, a recall of 94.9%, an F1 score of 93.4%, a training accuracy of 98.64%, and a testing accuracy of 95.96%. ADaDR-22 ensemble fusion CNNs attained the lowest values, with a precision of 89.92%, a recall of 93.47%, an F1 score of 86.76%, a training accuracy of 97.74%, and a testing accuracy of 93.47%. ADaR-22 ensemble fusion CNNs achieved 92.12% precision, 91.58% recall, an F1 score of 85.61%, a training accuracy of 96.95%, and a testing accuracy of 91.58%. The DaRD-22 model achieved the highest precision, recall, and F1 score for the ensem-              Presented in Table 6 are the performance metrics for ensemble fusion CNN-DarD-22 with BiLSTM and multi-class SVM, utilising the feature fusion CKHK-22 mixed dataset. Each row in the table corresponds to a specific class of polyps or non-polyp regions in the colon, and each column provides a different performance metric for that class. Precision is the proportion of true positive predictions for a given class out of all positive predictions for that class. In other words, precision measures the proportion of correctly identified instances of a given class out of all instances predicted as that class. The classes with precision above 0.9 are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflexstomach, and z-line.
Recall assesses the percentage of true positive predictions for a given class out of all actual instances of that class in the test dataset. In other words, recall measures the proportion of correctly identified instances of a given class out of all actual instances of that Presented in Table 6 are the performance metrics for ensemble fusion CNN-DarD-22 with BiLSTM and multi-class SVM, utilising the feature fusion CKHK-22 mixed dataset. Each row in the table corresponds to a specific class of polyps or non-polyp regions in the colon, and each column provides a different performance metric for that class. Precision is the proportion of true positive predictions for a given class out of all positive predictions for that class. In other words, precision measures the proportion of correctly identified instances of a given class out of all instances predicted as that class. The classes with precision above 0.9 are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflexstomach, and z-line.
Recall assesses the percentage of true positive predictions for a given class out of all actual instances of that class in the test dataset. In other words, recall measures the proportion of correctly identified instances of a given class out of all actual instances of that class. The classes with recall above 0.9 are bbps-0-1, bbps-2-3, cecum, non-polyps, pylorus, retroflex-stomach, and z-line.
The F1 score is the harmonic mean of precision and recall. It is a single metric that combines precision and recall into one number. The classes with an F1 score above 0.9 are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflex-stomach, and z-line. Support refers to the number of test images that belong to a particular class. The classes with the highest support in this dataset are polyps and dyed-lifted-polyps with 2604 and 1803 images, respectively. However, the highest-performing classes based on precision, recall, and F1 score are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflex-stomach, and z-line.
Overall, the results suggest that the ensemble fusion CNN-DarD-22 using the feature fusion CKHK-22 mixed dataset can accurately identify polyps and non-polyp regions in the colon, with several classes exhibiting high precision, recall, and F1 score. The analysed performance metrics are shown in Figure 25. class. The classes with recall above 0.9 are bbps-0-1, bbps-2-3, cecum, non-polyps, pylorus, retroflex-stomach, and z-line. The F1 score is the harmonic mean of precision and recall. It is a single metric that combines precision and recall into one number. The classes with an F1 score above 0.9 are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflex-stomach, and z-line. Support refers to the number of test images that belong to a particular class. The classes with the highest support in this dataset are polyps and dyed-lifted-polyps with 2604 and 1803 images, respectively. However, the highest-performing classes based on precision, recall, and F1 score are bbps-0-1, bbps-2-3, non-polyps, pylorus, retroflex-stomach, and z-line.
Overall, the results suggest that the ensemble fusion CNN-DarD-22 using the feature fusion CKHK-22 mixed dataset can accurately identify polyps and non-polyp regions in the colon, with several classes exhibiting high precision, recall, and F1 score. The analysed performance metrics are shown in Figure 25. From the foregoing, we may deduce that this classifier's confusion matrix will contain many true positives and true negatives for the classes with high recall values, and many false negatives for the classes with low recall values. Figure 25 illustrates the confusion matrix for the DaRD-22 with BiLSTM and multi-class SVM performance of the ensemble fusion CNN on the CKHK-22 dataset. More insight into the classifier's performance might be gained with the use of a thorough confusion matrix, which would show the real number of true positives, false positives, true negatives, and false negatives for each class. The performance metric recall indicates how many true positive instances were accurately labelled as such by the classifier. The following may be inferred from the table of recall values: it turns out that "bbps-0-1" (0.98), "bbps-2-3" (0.99), "cecum" (0.99), "polyps" (0.82), "pylorus" (1.00), "retroflex-stomach" (0.99), and "z-line" (0.99) had the greatest recall values. These are the categories for which a large percentage of true positives were properly identified by the classifier. Recall values for "dyed-lifted-polyps" (92.2%) and "dyed-resection-margins" (43.3%) were the lowest. These are the categories where the classifier produced a larger number of false negatives because it incorrectly classified a large percentage of true positives.
The ROC curve is a graphical representation of a binary classifier system's performance as its discrimination threshold is altered. Figure 26 depicts the confusion matrix and Figure 27 depicts the ROC curve for DaRD-22 using the CKHK-22 feature fusion dataset. The True Positive Rate (TPR) is plotted on the y-axis and the False Positive Rate (FPR) is plotted on the x-axis. AUC (Area Under the Curve) is a metric that assesses the classification system's overall performance. In this instance, the AUC value is 0.9882, indicating that the efficacy of the classifier system is very high. It indicates that the system From the foregoing, we may deduce that this classifier's confusion matrix will contain many true positives and true negatives for the classes with high recall values, and many false negatives for the classes with low recall values. Figure 25 illustrates the confusion matrix for the DaRD-22 with BiLSTM and multi-class SVM performance of the ensemble fusion CNN on the CKHK-22 dataset. More insight into the classifier's performance might be gained with the use of a thorough confusion matrix, which would show the real number of true positives, false positives, true negatives, and false negatives for each class. The performance metric recall indicates how many true positive instances were accurately labelled as such by the classifier. The following may be inferred from the table of recall values: it turns out that "bbps-0-1" (0.98), "bbps-2-3" (0.99), "cecum" (0.99), "polyps" (0.82), "pylorus" (1.00), "retroflex-stomach" (0.99), and "z-line" (0.99) had the greatest recall values. These are the categories for which a large percentage of true positives were properly identified by the classifier. Recall values for "dyed-lifted-polyps" (92.2%) and "dyed-resection-margins" (43.3%) were the lowest. These are the categories where the classifier produced a larger number of false negatives because it incorrectly classified a large percentage of true positives.
The ROC curve is a graphical representation of a binary classifier system's performance as its discrimination threshold is altered. Figure 26 depicts the confusion matrix and Figure 27 depicts the ROC curve for DaRD-22 using the CKHK-22 feature fusion dataset. The True Positive Rate (TPR) is plotted on the y-axis and the False Positive Rate (FPR) is plotted on the x-axis. AUC (Area Under the Curve) is a metric that assesses the classification system's overall performance. In this instance, the AUC value is 0.9882, indicating that the efficacy of the classifier system is very high. It indicates that the system can effectively differentiate between positive and negative samples. Since the AUC value is close to 1, it can be inferred that there is no misclassification in any class and that the accuracy of all classes is greater than 0.96. The EnsemDeepCADx system is a powerful instrument for detecting and diagnosing colorectal cancer, as evidenced by its AUC of 0.9882 and outstanding accuracy values in all 10 classes.
can effectively differentiate between positive and negative samples. Since the AUC value is close to 1, it can be inferred that there is no misclassification in any class and that the accuracy of all classes is greater than 0.96. The EnsemDeepCADx system is a powerful instrument for detecting and diagnosing colorectal cancer, as evidenced by its AUC of 0.9882 and outstanding accuracy values in all 10 classes.    can effectively differentiate between positive and negative samples. Since the AUC value is close to 1, it can be inferred that there is no misclassification in any class and that the accuracy of all classes is greater than 0.96. The EnsemDeepCADx system is a powerful instrument for detecting and diagnosing colorectal cancer, as evidenced by its AUC of 0.9882 and outstanding accuracy values in all 10 classes.

Discussion
Several studies have proposed CADx systems for the diagnosis of colorectal cancer, each employing a unique set of methodologies and models. In contrast, the EnsemDeepCADx system proposed in this study utilised the DaRD-22 ensemble fusion CNN along with BLSTM and SVM to achieve a 97.89% accuracy on the CKHK-22 mixed feature fusion dataset. This study demonstrates the potential for deep learning and transfer learning to improve the performance of CADx systems for the early detection of colorectal cancer. Table 7 compares the proposed EnsemDeepCADx system for 2023 to colorectal cancer procedures in 2021 and 2022, as well as their respective descriptions. The comparison and discussion emphasise the diverse methodologies and approaches used by the various studies to develop CADx systems for the diagnosis of colorectal cancer. Even though a number of studies have produced exceptional precision, there is still ample opportunity for growth. This study demonstrates positive outcomes for the proposed CADx system, and future research could build on this foundation by investigating complementary models and techniques. The progress made in the area of colorectal cancer detection is graphically shown in Figure 28. (2023) x BLSTM,SVM with feature fusion Mixed Dataset The comparison and discussion emphasise the diverse methodologies and approaches used by the various studies to develop CADx systems for the diagnosis of colorectal cancer. Even though a number of studies have produced exceptional precision, there is still ample opportunity for growth. This study demonstrates positive outcomes for the proposed CADx system, and future research could build on this foundation by investigating complementary models and techniques. The progress made in the area of colorectal cancer detection is graphically shown in Figure 28. An "explainable AI" system is one that can explain its logic behind a prediction or action in a manner that a human can understand. To diagnose colorectal cancer, EnsemDeepCADx employs an ensemble of convolutional neural networks (CNNs; particularly, , bidirectional long short-term memory (BLSTM), and support vector machines (SVM) with feature fusion. One of the main advantages of the EnsemDeepCADx system is its interpretability. CNNs enable the system to learn and extract information relevant for diagnosing colorectal cancer from colonoscopy pictures. The ensemble approach boosts the system's performance even more by combining many models and relying on their combined expertise. The BiLSTM component is incorporated to improve the system's interpretability because of its ability to capture temporal linkages and sequential patterns within image data. This enables the computer to take into consideration the illness's history and context, resulting in more exact estimates. Furthermore, the SVM algorithm is employed as a decisive element in defining the many types of colorectal cancer. This opens the path for more accurate cancer and other illness diagnosis. Feature fusion in EnsemDeepCADx combines data from many image representations, including colour, greyscale, and local binary pattern (LBP) pictures. We can better capture the intricacies of the underlying data and perform a more exact analysis by integrating these various attributes.
When these variables are integrated, EnsemDeepCADx's prediction accuracy for colorectal cancer jumps to an astounding 97.89%. The system's interpretability instils trust in Figure 28. Comparison of the state-of-the-art methods for colorectal cancer detection.
An "explainable AI" system is one that can explain its logic behind a prediction or action in a manner that a human can understand. To diagnose colorectal cancer, EnsemDeep-CADx employs an ensemble of convolutional neural networks (CNNs; particularly, , bidirectional long short-term memory (BLSTM), and support vector machines (SVM) with feature fusion. One of the main advantages of the EnsemDeepCADx system is its interpretability. CNNs enable the system to learn and extract information relevant for diagnosing colorectal cancer from colonoscopy pictures. The ensemble approach boosts the system's performance even more by combining many models and relying on their combined expertise. The BiLSTM component is incorporated to improve the system's interpretability because of its ability to capture temporal linkages and sequential patterns within image data. This enables the computer to take into consideration the illness's history and context, resulting in more exact estimates. Furthermore, the SVM algorithm is employed as a decisive element in defining the many types of colorectal cancer. This opens the path for more accurate cancer and other illness diagnosis. Feature fusion in EnsemDeepCADx combines data from many image representations, including colour, greyscale, and local binary pattern (LBP) pictures. We can better capture the intricacies of the underlying data and perform a more exact analysis by integrating these various attributes.
When these variables are integrated, EnsemDeepCADx's prediction accuracy for colorectal cancer jumps to an astounding 97.89%. The system's interpretability instils trust in the decision-making process among healthcare providers. Clinicians can better interact with patients and back up their own assessment if they understand what goes into the system's predictions.
Finally, the EnsemDeepCADx system combines explainable AI ideas by using interpretable components such as CNNs, BLSTMs, and SVMs, as well as feature fusion approaches. These design changes were taken in order to increase the system's utility to clinicians in the identification of colorectal cancer by making it more open and offering more relevant reasons for its forecasts.

Conclusions and Future Work
Employing a combination of the ADaDR-22, ADaR-22, and DaRD-22 ensemble fusion CNNs, the EnsemDeepCADx system has been shown to increase diagnostic accuracy in the identification of colorectal cancer. Information from colonoscopy pictures is extracted using deep learning architectures such as AlexNet, DarkNet-19, DenseNet-201, and ResNet-50. The EnsemDeepCADx system is evaluated using the CKHK-22 mixed dataset, which includes colour, greyscale, and LBP image datasets to showcase the system's adaptability. By using a feature fusion method to combine the collected characteristics, the system performs a comprehensive analysis of the provided input. The EnsemDeepCADx system has a top testing accuracy of 97.12% thanks in large part to the DaRD-22 ensemble CNN fusion, BILSTM, and multi-class SVM. The system's potential for early and accurate detection of colorectal cancer is shown by this degree of accuracy. One of the numerous advantages of the EnsemDeepCADx system is how quickly data can be processed. Colorectal cancer may now be identified in as little as 2 h thanks to this cutting-edge technology. This expedited process increases confidence that the system will provide useful results in a reasonable amount of time. Additional studies into various deep learning approaches and parameter optimisation are emphasised as a means to further enhance the system's performance. The value of utilising ensemble fusion CNNs to boost colorectal cancer detection accuracy and throughput is also emphasised.
In conclusion, the EnsemDeepCADx system's utilisation of ensemble fusion CNNs and integration of BILSTM and SVM shows potential in the domain of colorectal cancer detection. The system's ability to include cutting-edge deep learning algorithms and feature fusion approaches while maintaining an efficient processing time of 2 h increases the likelihood that it will significantly contribute to early detection and improved patient outcomes.
Even though the current EnsemDeepCADx system has demonstrated some success, it could be improved further. Future research may focus on advanced deep learning techniques, such as generative adversarial networks (GANs) for data augmentation and attention mechanisms for enhanced feature extraction. The system may utilise data from other medical imaging modalities, such as magnetic resonance imaging (MRI) or computed tomography (CT) examinations, to enhance its diagnostic capabilities. Future research may also concentrate on augmenting the technology for use in real time during colonoscopies, thereby facilitating instantaneous feedback and, potentially, a more rapid diagnosis and treatment. Ongoing research and development of CADx systems for the early diagnosis of colorectal cancer will likely result in improved patient outcomes in the long term.
Author Contributions: This article examines A.S.N.R.'s experimental work with a concentration on the conceptualization of ideas. Additionally, the study investigates the concept and document flow proposed by K.V., as well as their respective implementation processes. All authors have read and agreed to the published version of the manuscript.
Funding: It received no external funding and was supported solely by institution funding from SRM Institute of Science and Technology to provide the necessary facilities.
Institutional Review Board Statement: This investigation was approved by the Institutional Review Board (IRB) at SRM Institute of Science and Technology, Kattankalattur. The investigation was conducted in accordance with the Declaration of Helsinki's applicable ethical principles. All investigation participants gave their consent after being completely informed. Throughout the entire research procedure, participants' anonymity and confidentiality were strictly protected. The IRB evaluated and approved the study protocol and procedures to protect the rights and welfare of the participants.

Informed Consent Statement:
Prior to participation in this investigation, all participants provided their informed consent. Participants were provided with comprehensive information regarding the purpose, procedures, potential hazards and benefits, measures of confidentiality, and their rights as research subjects. Before providing their assent voluntarily, they were given ample opportunity to ask questions and address any concerns. It was made abundantly clear to the participants that their participation was wholly voluntary and that they were free to disengage from the study at any time with no repercussions. Each participant provided written consent, and anonymity and confidentiality were rigorously maintained throughout the duration of the study. The study protocol complied with ethical standards and was approved by the relevant Institutional Review Board.

Data Availability Statement:
Collected data is from publicly accessible colonoscopy datasets, CVC clinic DB, Kvasir 2, and Hyper Kvasir for the project. Even though this work with internal human organs is available to the public via internet sources, it has not been deemed ethical by official authorities. Data are publicly available from the following websites: CVC clinic DB Dataset was obtained from https://www.kaggle.com/datasets/balraj98/cvcclinicdb (accessed on 3 May 2023); Kvasir2 Dataset was obtained from https://datasets.simula.no/kvasir/ (accessed on 3 May 2023); Hyper Kvasir Dataset was obtained from https://datasets.simula.no/hyper-kvasir/ (accessed on 3 May 2023). In this research paper, the combining of three datasets is presented as a new dataset known as a mixed dataset.