Introduction

An ulcer is a form of the gastrointestinal (GI) tract; about 10 percent of people have this condition. It is inflammatory chronic erosion or sore on the internal portion of the mucous skins [1, 2]. Itself ulcer is not fetal, but its symptoms are of serious ailments, i.e., Crohn’s ailment and the ulcerative of colitis might cause death at a complication stage [3]. Stomach ulcers are sores in the lining of the stomach and the duodenum. Up to 4 million peoples develop stomach ulcers in the United State per year (i.e., 1 out of 10 people) [4].

Conventional imaging protocols for ulcers are sonde and push endoscopy [5]. In the inspection process, it is entered into the anus or mouth of the patients by the experienced doctors to analyze the GI tract [6]. The traditional methodologies performed a vital role to analyze the lower and upper ends of GI [7]. Wireless capsule endoscopy (WCE) is an alternative method to offer painless, non-invasive, and direct small bowel inspection [8]. Commercially accessible WCE comprises the optical dome, part of illumination, batteries, and imaging sensors [9, 10]. WCE captures 2–4 images per second for nearly 8 h within the GI patient’s tract and transmits them wirelessly and placed in a machine connected to the patient’s waist [11]. Physicians can download and examine all photographs off-line for diagnostic purposes [12]. WCE creates approximately 55,000 images across each patient, in which 5% of images are normal from the whole collected WCE images, However, for physicians, it is a time-consuming and exhausting assignment [13]. Thus, it is important to develop an automated approach to analyze the ulcer images, and the physician’s workload is reduced [14]. The texture features [15] play an essential role in differentiation among the healthy/ulcer images. The Bidimensional EMD (BEEMD) method is used to classify the normal/ulcer images [16]. Curvelet-based lacunarity (DCT-LAC) technique, multi-level super-pixel approach are used for ulcer detection [17].

Detection of stomach infections at an initial stage may help to reduce the risk of mortality. Manual stomach infection evaluation is a laborious and time-consuming task in contrast to computerized methods used for the analysis of stomach. Stomach lesions segmentation and classification are performed using conventional and deep learning methodologies. Hand-crafted features are selected in classical approaches, whereas deep learning methods can learn to extract informative features in the pipeline. Although, a considerable amount of work has been done in this domain, while accurate stomach lesions detection is still a challenging [18]. This research work is based on two phases, in Phase-I deeplabv3 is utilized as a base model of the pre-trained ResNet-50 model to overcome the existing limitations. For accurate segmentation, the model is trained by selecting the hyperparameters after extensive experimentation. While, in Phase-II Resnet-50 model is trained using input images and classification outcome are analyzed using uncertainity based on the thresholding and Bayesian neural network to authenticate the prediction accuracy. The foremost contribution steps of the proposed model are as follows:

Phase I: Deeplabv3 with pre-trained ResNet-50 model for features mapping are developed for precise lesion segmentation. Phase II: The extracted features from ResNet-50 model and classification results are evaluated utilizing uncertainty by thresholding and BNN.

The organization article is manifested as follows: discussed related work in Sect. 2; proposed works is given in Sect. 3 and Sect. 4 explores experimental findings. In Sect. 5, conclusion is stated.

Related work

Much work has been devoted in developing an automated approach for ulcer detection [18,19,20,21,22]. Some latest existing techniques are discussed in this section. The stomach ulcer segmentation is a big challenge because endoscopy images having low contrast, illumination, and brightness issues, thus the infected region is not segmented accurately [23]. The classification of the different types of gastrointestinal infections is also an intricate task [24], because it relies on the feature’s extraction framework which directly impediment the classification accuracy [25]. An automated system has been presented to process the WCE images for early detection [26], where features are used and combined into a single vector and subsequently fed to the classifiers for ulcers/ bleeding classification [27]. This approach achieved an accuracy of 92.86% and 93.64% on bleeding and ulcers respectively [28]. Another approach has been presented for the detection of infection in the stomach and achieved an accuracy of 98.3% [29]. A method has been trained using the pre-trained ResNet-101 model and extracted features are optimized using the grasshopper method and passed to SVM for classification of different types of infections in the stomach such as a polyp, bleeding, and ulcer [25, 30]. The input images quality has been improved by applying a contrast enhancement and classical deep features [31] are extracted and best are selected using entropy [32,33,34]. These best features are passed as an input to the classifiers, in which KNN outperform as compared to other classifiers and achieved 99.42% accuracy [25]. The deep features have been extracted from transfer learning models such as AlexNet and Google Net for ulcer classification [35]. A saliency-based segmentation approach has been employed for ulcer segmentation [36]. The Hidden Markov Model (HMM) has been applied for the detection of stomach ulcers on two datasets [37]. An automated system has been presented which comprises the transformation of HSI, YIQ color [38, 39], and features fusion using singular value decomposition, [40] and finally classification is performed based on extracted features [41]. The square least saliency transformation with the probabilistic fitting model has been employed for the classification of stomach ulcers [42]. The weakly supervised neural model has been utilized for stomach ulcer detection. The extracted features from [43] VGG model and transferred as input to the classifiers for gastric ulcers classification [44]. Classical deep model has been utilized for stomach ulcer classification on 5560 images of WCE into ulcers, erosions/normal classes and its achieved accuracy of 90.8% [45]. The CNN model has been employed for the classification of different types of stomach lesions such as ulcers, bleeding and polyps with 72 and 71 percent specificity and sensitivity respectively [46].The GDP network has been utilized for stomach ulcer classification with 88.9% accuracy [47]. HA network with the residual model have been employed for stomach infection. The model achieved 91% accuracy [48].

In literature, extensive studies have been performed for the detection of different types of stomach infections; however, still, there is a gap in this domain because stomach lesions appear in a variable shape and size [14, 48, 58]. The selection of the learning parameters i.e., optimizing function, learning rate and batch-size of the CNN models is still a challenge that directly affects the classification accuracy. Pre-trained models such as Google net and Alex net are trained on the stomach ulcer datasets on 0.01 learning rate that does not provides satisfactory classification outcomes [35]. The MCNet does not provide accurate lesions segmentation due to unclear boundaries among the infected and the healthy regions [49].

Therefore, in this reported research a new framework is trained on optimum hyper-parameters for accurate segmentation. The deep extracted features from ResNet-50 model and supplied as an input to the softmax for stomach infections classification. Furthermore, uncertainty based on thresholding and Bayesian neural network [50] is performed to analyze the prediction scores.

Proposed methodology

A modified model is presented for gastrointestinal infections detection. The technique comprises the two major phases as manifested in Fig. 1. In phase I, the infected stomach region is segmented with ground truth using a modified semantic segmentation model, whereas, a pre-trained ResNet-50 model is presented for the classification of different types of gastrointestinal infections such as Bleeding, Ulcer, Polyps, and normal stomach images.

Fig. 1
figure 1

Proposed method steps for segmentation and classification

Semantic segmentation of stomach ulcer

In the proposed model DeepLab v3 + network [51], is utilized as a bottleneck, in which CNN utilized encoder-decoder, skip connections, and dilated convolutions. The ResNet-50 is used as a head network of the deep labv3 for stomach infection segmentation. The semantic segmentation model comprises the 206 layers, which includes 01 input, 62 convolutional, 65 batch-normalization, 32 ReLU, 02 crop2d, 01 max-pooling, softmax, and pixel classification. The layered proposed semantic model is depicted in Fig. 2.

Fig. 2
figure 2

Proposed semantic segmentation model

The hyperparameters for a proposed semantic segmentation model are given in Table 1.

Table 1 Hyperparameters for a proposed semantic segmentation model

Table 1, presents the model building hyperparameters such as 100 epochs, SGD optimizer, 0.001LR, and 16 batch- sizes are utilized for model training due to maximum accuracy. Figure 3, shows the segmented stomach lesions with ground masks.

Fig. 3
figure 3

Segmentation of stomach infections a input images, b 3D-segmentation, c ground masks, d annotations on input images

Uncertainty estimation based on ResNet-50

In the medical domain, disease grading classification through computerized systems is much helpful for the gastroenterologist at the same time it has become complicated due to the increase in the size of the patient’s data. Currently, a convolutional neural network performs a vital role on larger datasets as compared to small-scale datasets. In this work ResNet-50 [52] is applied for model training which consists of 177 layers including 52 convolutional and batch-normalization, ReLU 49, 02 average pooling, 16 addition, 01 softmax, 01 classifications. Transfer learning is implemented for features mapping on stomach infection, which is previously trained on the ImageNet database. The features mapping is performed by the activation function of categorical cross-entropy which is defined as

$$\mathrm{Categorical cross entropy}=-\sum_{\mathrm{i}}^{\mathrm{C}}{\mathrm{t}}_{\mathrm{i}}\mathrm{log}({\mathrm{s}}_{\mathrm{i}})$$
(1)

where \({\mathrm{t}}_{\mathrm{i}}\) and \({\mathrm{s}}_{\mathrm{i}}\) denotes label and CNN score of each class(C). The 60:40 ratio is utilized for model training and testing. The description of the model with the number of layers and selected neurons are mentioned in Table 2.

Table 2 Model description

The model training is performed on selected hyperparameters as mentioned in Table 3

Table 3 Hyperparameters for model building

Performance improvement via uncertainty-aware stomach infections classification

The uncertainty of the classification model is used for estimating the prediction in two ways (i) estimate the probability based on thresholding (ii) probability estimation based on the Bayesian neural network.

In this method, randomly complete data is split into training and testing parts. The threshold value is computed across each class label.

In BNN, given a dataset (D) =\({\mathrm{x}}_{\mathrm{n}}\in {\mathcal{R}}^{\mathrm{D}},{\mathrm{y}}_{\mathrm{n}}\in {\mathcal{R}}^{\mathrm{C}}{\}}_{\mathrm{n}=1}^{\mathrm{N}}\) where \({x}_{\mathrm{n}}\) represent input feature vector and \({\mathrm{y}}_{\mathrm{n}}\) denotes the one-hot encoded label vector. The predictive BNN on a new sampled \(\{x*,y*\}\) might be

$$p\left(\mathrm{y}*\left|x*\right., D\right)=\int p\left(y*\left|x*, W\right.\right)p\left(W*\left|x*, \right. D\right)\mathrm{dW}$$
(2)

where W represent weights, \(p\left(y*\left|x*,W\right.\right)\) denotes softmax function by \({f}_{\mathrm{W}}(x*)\) and \(p\left(W\left|x*,\right. D\right)\) shows posterior over weights. \(p\left(y*\left|x*,W\right.\right)\) shows network forward pass. The predictive distribution by Monte Carlo as defined as:

$$p\left(y*\left|x*\right.,D\right)=\frac{1}{T}\sum_{t=1}^{T}\mathrm{softmax}({{f}_{\mathrm{w}}}_{\mathrm{t}}*\left({x}_{*}\right))$$
(3)

where predictive distribution might be computed through forward pass of a model T running with drop out employed to produce predictions T and computes standard deviation over softmax T samples outputs. The BNN utilized dropout for sampling to posterior predictive distribution that is referred as Monte Carlo dropout.

The predictions of all stomach infection test images are performed and sorted through their related uncertainty predictions. On the different uncertainty levels, predictions are conducted for diagnosis and compute the prediction accuracy at the specified threshold according to the class labels.

Dataset descriptions

In Table 4, proposed method performance is computed on five benchmark datasets such as a privately collected imaging dataset having 30 WCE videos, where 10 ulcer videos, 10 bleeding videos, and the remaining 10 videos are healthy [41]. Each video contains 500 frames. The CVC–Clinic DB database contains 612 WCE images with annotated ground truth [53]. The Nerthus dataset contains 21 WCE videos with 5525 number of frames [54]. The kvasir-segmentation dataset comprises 1000 WCE images with ground-masks [55]. The kvasir-classification dataset contains 4000 images of 8 classes, where each class contains 500 images of different types of stomach infections [56].

Table 4 Datasets description

Experimental results and discussion

For evaluation of the efficiency of the proposed system, two experiments were carried out. Experiment#1 is done to compute the proposed segmentation model performance with ground annotated masks. Experiment#2 is implemented to analyze the classification results. The overall experiments are implemented on MATLAB 2020RA toolbox with coreI7 CPU, 32 GB RAM, and 8 GB Nvidia graphic card 2070 RTX.

Experiment#1 evaluation of semantic segmentation

The proposed semantic segmentation method performance is evaluated with ground annotated masks as given in Table 5.

Table 5 Segmentation results on three benchmark datasets

The segmented stomach lesions achieved global accuracy of 0.98, 1.00, 0.98 on CVC–Clinic DB, Kvasir-SEG, and Private collected images respectively. The segmentation results with ground masks on benchmark datasets are shown in Figs. 4, 5, 6.

Fig. 4
figure 4

Segmented stomach lesions outcomes on Kvasir dataset, a endoscopy frames, b segmented stomach lesions, c binary segmentation, d and e mapping on input images

Fig. 5
figure 5

Segmentation results on private collected images, a endoscopy frames, b segmented stomach lesions, d binary segmentation, c and e mapping on input images

Fig. 6
figure 6

Segmented stomach lesions on Kvasir dataset, a input image, b segmentation, d binary segmentation, c and e mapping on input images

The segmented stomach lesions are computed with truth annotations masks on three benchmark datasets such as kvasir, private collected images, and CVC-CLINIC. Figures 4, 5, 6 shows that the proposed method more precisely segments the stomach infections. The proposed segmentation results are compared on the same benchmark datasets as mentioned in Table 6.

Table 6 Comparison with latest existing methodologies

Table 6, shows the existing methodologies for segmentation of stomach infections such as [9, 41, 52, 53, 59, 66]. In the comparison analysis, the FCN method has been employed with 8, 16, and 32 fully connected layers, in which FCN-32 s achieved the highest 0.83, mean accuracy [57]. Seg-network [58] and dilation model [59] obtained 0.85 segmentation accuracy, while the U-net model [60] has been employed with different pre-trained networks such as VGG-16, VGG-19, and ResNet-34 [61]. Without any combination, only the U-net model achieved 0.86 mean segmentation accuracy, which is maximum compared to other pre-trained models. MCNet [62] model is employed for lesion segmentation with 0.84 mean accuracy. Comparison reflects that in the proposed model, deeplabv3 is used as a backbone of the ResNet-50 model and it has attained 0.98 mean accuracy which is also superior compared to recent all published work in this domain.

The proposed segmentation results are also compared with the U-net [60] model, the visually segmentation results as seen in Fig. 7.

Fig. 7
figure 7

Segmented stomach lesions on CVC-Clinic-DB dataset, a endoscopy frames, b Unet segmentation, c segmented into binary regions, d proposed model segmentation, e proposed binary segmentation, f ground annotation

Figure 7 results show that, on U-net segmentation model false positive rate is increased due to the segmentation of non-lesions pixels, while the proposed segmentation model (deeplabv3 & ResNet-50) segment the actual stomach lesions more precisely as compared to other models.

Experiment#2 classification of different types of stomach infections

The classification results on Nerthus-dataset-frames, private collected images, and kvasir classification dataset as given in Table 7, 8, 9, 10, 11, 12, 13.

Table 7 Prediction ACC using uncertainty based on thresholding
Table 8 Prediction ACC using uncertainty based on BNN
Table 9 Prediction ACC using uncertainty based on thresholding
Table 10 Prediction ACC using uncertainty based on BNN from kvasir dataset
Table 11 Prediction ACC using uncertainty based on thresholding from private collected images
Table 12 Prediction ACC using uncertainty based on BNN from private collected images
Table 13 Results comparison

Classification results on Nerthus-dataset-frames

The classification of four different types of stomach infections such as Grade-1 Bowl, Grade-2Bowl, Grade-3Bowl, and Grade-4Bowl. The classification results of different grading of the Bowl are analysed in terms of uncertainty measures such as thresholding and BNN. The proposed method achieved a 1.00 prediction rate on the uncertainty based on the thresholding. The prediction results are shown in Table 7 and Figs. 8, 9.

Fig. 8
figure 8

Confusion matrix on four bowl grades using uncertainty based on thresholding

Fig. 9
figure 9

Prediction scores probability on Nerthus-dataset-frames of Grade 0–4 is shown in (ad)

Table 7, shows the probability of the prediction scores based on thresholding, where overall achieved accuracy is 1.00. The precision rates of the Bowl grades are 1.00, 1.00, 0.99, 0.99 on Grade1, Grade 2, Grade3, and Grade 4 respectively. Similarly, in the same experiment, the prediction rate has computed using BNN as presented in Table 7 and Figs. 10, 11.

Fig. 10
figure 10

Confusion matrix on four bowl grades using uncertainty based on BNN

Fig. 11
figure 11

Shows prediction scores probability on Nerthus-dataset-frames of Grade Bowl-1 to 4 are shown in ad

The prediction scores on BNN, the proposed method achieved maximum 0.96 accuracy and precision rate are 0.96, 0.97, 0.96, 0.96 on Grade Bowl-0, Grade Bowl-1, Grade Bowl-2, and Grade Bowl-3 respectively.

Classification results on kvasir-classification dataset

The classification results are computed on eight different kinds of stomach infections. The classification results are also computed on uncertainty-based thresholding and BNN as given in Table 9, 10 and Figs.12, 13.

Fig. 12
figure 12

Confusion matrix on four bowl grades using uncertainty based on thresholding

Fig. 13
figure 13

Prediction scores probability on kvasir dataset grades of 0, 1,2, 3, 4, 5, 6, 7 are shown in (ah)

The classification results on kvasir dataset using BNN as given in the Table 8 and Figs. 14, 15.

Fig. 14
figure 14

Confusion matrix on eight different classes of stomach infection using uncertainty based on BNN

Fig. 15
figure 15

Probability scores based BNN uncertainty are presented in terms of grades 0–7 (ah)

Classification results on private collected images

The classification results on private collected are manifested in Figs. 16, 17 and Table 11. The proposed method achieved testing accuracy of 0.982 and 0.10917 loss rates.

Fig. 16
figure 16

Confusion matrix on three different classes of stomach infection using uncertainty based on thresholding

Fig. 17
figure 17

Probability scores-based thresholding uncertainty on different grades of stomach infections, a healthy, b bleeding, c ulcers

On private collected images, classification is performed into three classes such as bleeding, healthy, and ulcers. The method achieved cumulative accuracy of 1.00, however, the precision rate of each class obtained 1.00 on healthy, bleeding, and ulcer, respectively. Classification results on uncertainty based on BNN are illustrated in Figs. 18, 19 and Table 11.

Fig. 18
figure 18

Confusion matrix on eight different classes of stomach infection using uncertainty based on BNN

Fig. 19
figure 19

Probability scores-based BNN uncertainty on different grades of stomach infections, a healthy, b bleeding, c ulcers

On private collected images, BNN achieved 0.98 prediction scores on three classes such as healthy, bleeding, and ulcer. Similarly, 0.98 precision rate on healthy, bleeding, and 0.99 on ulcer class. The classification outcomes are compared with recent approaches as stated in Table 13.

The comparison of the proposed classification results is performed with eight existing methodologies, where features extraction, selection, and fusion approaches have been used for the classification of normal and bleeding WCE images with 0.94 accuracy. The pre-trained AlexNet model has been utilized for the classification of ulcer and erosion [63]. The classical Gabor features with pre-trained dense-Net has been employed for cancer/normal images classification [64]. Deep features are extracted from WCE images for classification of normal/ulcer [65]. The classical and deep features are fused and informative features are selected for Polyp, ulcer, esophagitis, bleeding, and normal images classification [22]. The duo-deep model has been employed for the classification of ulcer, polyp, bleeding [66]. StomachNetwork has been utilized for the classification of Polyps, Ulcer and normal images with 0.96 accuracy [67]. The proposed method classified the esophagitis, colon polyps, normal, bleeding, and UC (ulcers) images with the highest prediction scores compared to existing works.

After the comparison analysis, we conclude that, no methods exist in the literature for segmentation and classification of the different types of stomach infections by using all publically available challenging datasets and private collected images. This research work investigates a new approach for segmentation and classification of the different types of stomach infections, where kvasir-Seg, CVCClinicDB and private collected images with ground-masks are used to compute the segmentation model performance. Whereas for classification four classes (bowl-grade1, bowl-grade2, bowl-grade3, and bowl-grade4) of Nerthus dataset, eight classes of the kvasir dataset and three classes (polyp, bleeding and healthy) of the private collected images are utilized. The classification outcomes indicate that the proposed methodology is superior to current works as compared to existing current works, which authenticate the proposed method contribution.

Conclusion

In this research, a new approach is presented for analysis of stomach infections but It is a difficult job using WCE because lesions having irregular shapes and sizes. The informative features extraction is still a challenge because it’s reduced the classification accuracy. Accurate segmentation is performed using a deep semantic segmentation model, where deeplabv3 is employed as a bottleneck of the ResNet-50 model. The proposed modified model segments the stomach lesions in terms of different measures such as 0.97 meanIoU, 0.98 global accuracy, 0.96 weighted IoU, 0.98 mean accuracy and 0.98 mean BF-score on CVC–Clinic DB, whereas 1.00 meanIoU, 1.00 global accuracy, 1.00 weighted IoU, 1.00 mean accuracy and 1.00 mean BF-score on Kvasir-SEG, while 0.99 meanIoU, 0.98 global accuracy, 0.99 weighted IoU, 0.99 mean accuracy and 0.99 mean BF-score on Private collected images. The prediction scores of the classification models across each benchmark dataset are computed using uncertainty based on the standard thresholding and Bayesian neural network (BNN). The uncertainty based on thresholding the proposed approach attained an accuracy of 1.00 on private collected images and Nerthus dataset, while 0.96 using BNN on Nerthus frames. Similarly, on the same experiment, kvasir dataset achieved accuracy of 0.87 using uncertainty based on thresholding and 0.64 using uncertainty based on BNN. In the future accuracy on kvasir dataset might be further improved to enhanced the prediction rate of stomach infection.