Gastric Tract Disease Recognition Using Optimized Deep Learning Features

: Artificial intelligence aids for healthcare have received a great deal of attention. Approximately one million patients with gastrointestinal diseases have been diagnosed via wireless capsule endoscopy (WCE). Early diagnosis facilitates appropriate treatment and saves lives. Deep learning-based techniques have been used to identify gastrointestinal ulcers, bleeding sites, and polyps. However, small lesions may be misclassified. We developed a deep learning-based best-feature method to classify various stomach diseases evident in WCE images. Initially, we use hybrid contrast enhancement to dis-tinguish diseased from normal regions. Then, a pretrained model is fine-tuned, and further training is done via transfer learning. Deep features are extracted from the last two layers and fused using a vector length-based approach. We improve the genetic algorithm using a fitness function and kurtosis to select optimal features that are graded by a classifier. We evaluate a database containing 24,000 WCE images of ulcers, bleeding sites, polyps, and healthy tissue. The cubic support vector machine classifier was optimal; the average accuracy was 99%.


Introduction
Stomach (gastric) cancer can develop anywhere in the stomach [1] and is curable if detected and treated early [2], for example, before cancer spreads to lymph nodes [3]. The incidence of stomach cancer varies globally. In 2019, the USA reported 27,510 cases (17,230 males and 10,280 females) with 11,140 fatalities (6,800 males and 4,340 females) [4]. In 2018, 26,240 new cases and 10,800 deaths were reported in the USA (https://www.cancer.org/research/cancer-facts-statistics/allcancer-facts-figures/cancer-facts-figures-2020.html). In Australia, approximately 2,462 cases were diagnosed in 2019 (1,613 males and 849 females) with 1,287 deaths (780 males and The principal conventional techniques used to detect stomach cancer are least-squares saliency transformation (LSST), a saliency-based method, contour segmentation, and color transformation [14]. Kundu et al. [15] sought to automate WCE frame evaluation employing LSST followed by probabilistic model-fitting; LSST detected the initially optimal coefficient vectors. A saliency/ best-features method was used by Khan et al. [16] to classify stomach conditions using a neural network; the average accuracy was 93%. Khan et al. [7] employed deep learning to identify stomach diseases. Deep features were extracted from both original WCE images and segmented stomach regions; the latter was important in terms of model training. Alaskar et al. [17] established a fully automated method of disease classification. Pretrained deep models (AlexNet and GoogleNet) were used for feature extraction and a softmax classifier was used for classification. A fusion of data processed by two pretrained models enhanced accuracy. Khan et al. [10] used deep learning to classify stomach disease, employing Mask RCNN for segmentation and finetuning of ResNet101; the Grasshopper approach was used for feature optimization. Selected features were classified using a multiclass support vector machine (SVM). Wang et al. [18] presented a deep learning approach featuring superpixel segmentation. Initially, each image was divided into multiple slices and superpixels were computed. The superpixels were used to segment lesions and train a convolutional neural network (CNN) that extracted deep learning features and engaged in classification. The features of segmented lesions were found to be more useful than those of the original images. Xing et al. [19] extracted features from globally averaged pooled layers and fused them with the hyperplane features of a CNN model to classify ulcers. Here, the accuracy was better than that afforded by any single model. Most studies have focused on training segmentation, which improves accuracy; however, the computational burden is high. Thus, most existing techniques are sequential and include disease segmentation, feature extraction, reduction, and classification. Most existing techniques focus on initial disease detection to extract useful features, which are then reduced. The limitations include mistaken disease detection and elimination of relevant features.
In the medical field, data imbalances compromise classification. In addition, various stomach conditions have similar colors. Redundant and irrelevant features must be removed. In this paper, we report the development of a deep learning-based automated system employing a modified genetic algorithm (GA) to accurately detect stomach ulcers, polyps, bleeding sites, and healthy tissue.
Our primary contributions are as follows. We develop a new hybrid method for color-based disease identification. Initially, a bottom-hat filter is applied and the product is fused with the YCbCr color space. Dehazed colors are used for further enhancement. A pretrained AlexNet model is fine-tuned and further trained using transfer learning. Also, deep learning features are extracted from FC layers 6 and 7 and fused using a vector length-based approach. Finally, an improved GA that incorporates fitness and kurtosis-controlled activation functions is developed.
The remainder of this paper is organized as follows. Section 2 reviews the literature. Our methodology is presented in Section 3. The results and a discussion follow in Section 4. Conclusions and suggestions for future work are presented in Section 5. Fig. 1 shows the architecture of the proposed method. Initial database images are processed via a hybrid approach that facilitates color-based identification of diseased and healthy regions. AlexNet was fine-tuned via transfer learning and further trained using stomach features. A cross entropy-based activation function was employed for feature extraction from the last two layers; these were fused using a vector length approach. A GA was modified employing both a fitness function and kurtosis. Several classifiers were tested on several datasets; the outcomes were both numerical and visual.

Color Based Disease Identification
Early, accurate disease identification is essential [20,21]. Segmentation is commonly used to identify skin and stomach cancers [22]. We sought to identify stomach conditions in WCE images. To this end, we employed color-based discrimination of healthy and diseased regions. The latter were black or near-black. We initially applied bottom-hat filtering and then dehazing. The output was passed to the YCbCr color space for final visualization. Mathematically, this process is presented as follows.
Given (x) is a database of four classes c 1 , c 2 , c 3 , and c 4 . Consider, where N = 256, M = 256, and k = 3, respectively. The bottom hat filtering is applied on image X (i, j) as follows: where the bottom hat image is represented by X bot (i, j), s is a structuring element of value 21, and · is a closing operator. To generate the color, a dehazing formulation is applied on X bot (i, j) as follows [23]: Here, X haz (i, j) represents a haze reduced image of the same dimension as the input image, Light represents the internal color of an image, t(x) is transparency and its value is between [0, 1]. Then, YCbCr color transformation is applied on X haz (i, j) for the final infected region discrimination. The YCbCr color transformation is defined by the following formula [24].
Here, the red, green, and blue channels are denoted R, G, and B, respectively. The visual output of this transformation is shown in Fig. 2. The top row shows original WCE images of different infections, and the dark areas in the images in the bottom row are the identified resultant disease infected parts. These resultant images are utilized in the next step for deep learning feature extraction.

Convolutional Neural Network
A CNN is a form of deep learning that facilitates object recognition in medical [25], object classification [26], agriculture [27], action recognition [28], and other [29] fields. Classification is a major issue. Differing from most classification algorithms, a CNN does not require significant preprocessing. A CNN features three principal hierarchical layers. The first two layers (convolution and pooling) are used for feature extraction (weights and biases). The last layer is usually fully connected and derives the final output. In this study, we use a pretrained version of AlexNet as the CNN.

Modified AlexNet Model
AlexNet [30] facilitates fast training and reduces over-fitting. The AlexNet model has five convolutional layers and three fully connected layers. All layers employ the max-out activation function, and the last two use a softmax function for final classification [31]. Each input is of dimension 227 × 227 × 3. The dataset is denoted , and the training data is represented by A c d ∈ . Each A c d belongs to the real number R.
Here s(.) denotes the ReLU activation function and ρ (1) denotes the bias vector. m (1) denotes the weights of the first layer and is defined as follows: where F denotes the fully connected layer. The input of the next layer is the output from the previous layer. This process is shown in mathematical form below.
Here, W (a) denotes the cross-entropy function, U indicates the overall number of classes v and p, and Q is predicted probability. The overall architecture of AlexNet is shown in Fig. 3.

Transfer Learning
Transfer Learning [32] is used to further train a model that is already trained. Transfer learning improves model performance. The given input is k t = j t 1 , y t 1 ), . . . , (j t x , y t x ), . . . , j t r , y t r } and the learning task is J, J t , (j t m , y t m ) ∈ R. The target is g o = j e 1 , y e 1 ), . . . , (j e x , y e x ), . . . , j e m , y e m }, and its learning task is J g , (j e r , y e r ∈ R, (m, r) where r m and y K 1 and y e 1 are training data labels. We fine-tuned the AlexNet architecture and removed the last layer (Fig. 4). Then, we added a new layer featuring ulcers, polyps, bleeding sources, and normal tissue; these are the target labels. Fig. 5 shows that the source data were derived from ImageNet and that the source model was AlexNet. The number of classes/labels was 1,000. The modified model featured four classes (see above) and was fine-tuned. Transfer learning delivered the new knowledge to create a modified CNN used for feature extraction.

Features Extraction & Fusion
Feature extraction is vital; the features are the object input [33]. We extracted deep learning features from layers FC6 and FC7. Mathematically, the vectors are F 1 and F 2 . The original feature sizes were N × 4096; 4,096 features were extracted for each image. However, the accuracies of individual vectors were inadequate. Thus, we combined multiple features into single vectors. We fused information based on vector length, as follows.
The resultant feature-length is N × 8192. This feature-length is large, and many features will be redundant/irrelevant. We minimized this issue by applying a mean threshold function that compared each feature to the mean. Mathematically, this process is expressed as follows.
This shows that fused vector features ≥ m were selected before proceeding to the next step. The other features are ignored. Then, the optimal features are chosen using an improved GA (IGA).

Modified Genetic Algorithm
A GA [34] is an evolutionary algorithm applied to identify optimal solutions among a set of original solutions. In other words, a GA is a heuristic search algorithm that organizes the best solutions into spaces. GAs involve five steps: initialization/population initialization, crossover, mutation, selection, and reproduction.
Initialization. The maximum number of iterations, population size, crossover percentage, offspring number, mutation percentage, number of mutants, and the mutation and selection rates are initialized. Here, the iteration number is 100, the population size 20, the mutation rate 0.2, the crossover rate 0.5, and the selection pressure 7.
Population Initialization. We initialize the size of the GA population (here 20). Every population is selected randomly in terms of its fused vector and evaluated using a fitness function. Here, the softmax function with the fine-k-nearest neighbor [F-KNN] method is used. Non-selected features undergo crossover and mutation.
Crossover. Crossover mirrors chromosomal behavior. A parent is used to create a child. Here, the uniform crossover rate is 0.5. Mathematically, crossover can be expressed as follows.
Here, P 1 and P 2 are the parents, which are selected, u is a random value that is initially selected as 1. Visually, this process is shown in Fig. 6. Mutation. To impart unique characteristics to the offspring, one mutation is created in each offspring generated by crossover. The mutation rate was 0.2. Then, we used the Roulette Wheel (RW) [35] method to select chromosomes. The RW is based on probability.
In Eq. (16), the sorted population is y δ , the last population is O l , and β 1 is the selected parent, which is 7. When the mutation is done, a new generation will be selected.

Selection and Reproduction.
Crossover and mutation facilitate chromosome selection by the RW method. Thus, the selection pressure is moderate rather than high or low. All offspring engage in reproduction, and then fitness values are computed. The chromosomes are illustrated in Fig. 7. They were evaluated using the fitness function where the error rate was the measure of interest. Then, the old generation was updated.
This process continues until no further iteration is possible. A vector has been obtained, but remains of high dimensions. To reduce the length, we added an activation function based on kurtosis. This value is computed after iteration is complete and used to compare selected features (chromosomes). Those that do not fulfill the activation criterion are discarded. Mathematically, it can be expressed as follows: The final selected vector is passed to several machine learning classifiers for classification. In this study, the vector dimension in is N × 1726.

Experimental Setup
We used 4,000 WCE images and employed 10 classifiers: The Cubic SVM, Quadratic SVM, Linear SVM, Coarse Gaussian SVM, Medium Gaussian SVM, Fine KNN, Medium KNN, Weighted KNN, Cosine KNN, and Bagged Tree. Of the complete dataset, 70% was used for training and 30% for testing (10 cross-validations). We used a Core i7 CPU with 14 GB of RAM and a 4 GB graphics card. Coding employed MATLAB 2020a and Matconvent (for deep learning). We measured sensitivity, precision, the F1-score, the false-positive rate (FPR), the area under the curve (AUC), accuracy, and time.

Results
The results are shown in Tab. 1. The highest accuracy was 99.2% (using the Cubic SVM). The sensitivity, precision, and F1-score were all 99.00%. The FPR was 0.002, the AUC was 1.00, and the (computational) time was 83.79 s. The next best accuracy was 99.6% (Quadratic SVM). The associated metrics (in the above order) were 98.75%, 99.00%, 99.00%, 0.002, 1.000, and 78.52 s, respectively. The Cosine KNN, Weighted KNN, Medium KNN, Fine KNN, MG SVM, Coarse Gaussian SVM, Linear SVM, and Bagged Tree accuracies were 97.0%, 98.0%, 96.7%, 98.9%, 98.9%, 93.3%, 96.9%, and 96.8%, respectively. The Cubic SVM scatterplot of the original test features is shown in Fig. 8. The first panel refers to the original data and the second to the Cubic SVM predictions. The good Cubic SVM performance is confirmed by the confusion matrix shown in Fig. 9. Bleeding was accurately predicted 99% of the time, as were healthy tissue and ulcers; the polyp figure was >99%. The ROC plots of the Cubic SVM are shown in Fig. 10.
Next, we applied our improved GA. The results are shown in Tab Fig. 11. The first panel refers to the original data and the second to the Cubic SVM predictions. The good Cubic SVM performance is confirmed by the confusion matrix shown in Fig. 12. In this figure, the four classes are healthy tissue, bleeding sites, ulcers, and polyps. Bleeding was accurately predicted 99% of the time, healthy tissue <99% of the time, and ulcers and polyps >99% of the time. The ROC plots of the Cubic SVM are shown in Fig. 13.      Figure 13: ROC plots for selected stomach cancer classes using cubic SVM after applying GA

Comparison with Existing Techniques
In this section, we compare the proposed method to existing techniques (Tab. 3). In a previous study [7], CNN feature extraction, fusing of different features, selection of the best features, and classification were used to detect ulcers in WCE images. The dataset was collected in the POF Hospital Wah Cantt, Pakistan; the accuracy was 99.5%. Another study [9] described handcrafted and deep CNN feature extraction from the Kvasir, CVC-ClinicDB, a private, and ETIS-Larib PolypDB datasets. The accuracy was 96.5%. In another study [15], and LSST technique using probabilistic model-fitting was used to evaluate a WCE dataset; the accuracy was 98%. Our method employs deep learning and a modified GA. We used the private dataset of the POF Hospital, and the Kvasir and CVC datasets to identify ulcers, polyps, bleeding sites, and healthy tissue. The accuracy was 99.8% and the computational time was 211.90 s. Our method outperforms the existing techniques.

Conclusion
We automatically identify various stomach diseases using deep learning and an improved GA. WCE image contrast is enhanced using a new color discrimination-based hybrid approach. This distinguishes diseased and healthy regions, which facilitates later feature extraction. We finetuned the pretrained AlexNet deep learning model by the classifications of interest. We employed transfer learning further train the AlexNet model. We fused features extracted from two layers; this improved local and global information. We removed some redundant features by modifying the GA fitness function and using kurtosis to select the best features. This improved accuracy and minimized computational time. The principal limitation of the work is that the features are of high dimension, which increases computational cost. We will resolve this problem by employing DarkNet and MobileNet (the latest deep learning models [36,37]). Before feature extraction, disease localization accelerates execution.