ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism

Oukdach, Yassine; Kerkaou, Zakaria; El Ansari, Mohamed; Koutti, Lahcen; Fouad El Ouafdi, Ahmed; De Lange, Thomas

doi:10.1007/s11042-023-18039-1

ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism

Published: 11 January 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yassine Oukdach¹,
Zakaria Kerkaou¹,
Mohamed El Ansari²,
Lahcen Koutti¹,
Ahmed Fouad El Ouafdi¹ &
…
Thomas De Lange^3,4

302 Accesses
2 Citations
Explore all metrics

Abstract

Video capsule endoscopy (VCE) is a non-invasive procedure to examine the human bowel. The VCE technology generates thousands of images from different parts of the gastrointestinal tract. Since the examination of these images is a tedious and time-consuming task for doctors, automated diagnosis of digestive diseases from VCE images is highly desired. The majority of the existing studies are based on CNN methods, which are not efficient enough in learning invariant global features in VCE images. Therefore, this paper presents a new framework that combines the learning of global and local features from VCE images. The proposed method utilizes a specific attention mechanism within a convolutional neural network to extract local features, while a vision transformer captures global features. Both local and global features are fused for final classification. Extensive experiments were performed on the public Kvasir Capsule Endoscopy dataset, revealing a promising accuracy of 97%. These results not only highlight the model’s capabilities but also demonstrate its favorable standing when compared to the state-of-the-art methods. Additionally, achieving a recall of 85%, the proposed system demonstrated robust generalization capabilities, performing impressively on an unseen dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention Aware Deep Learning Model for Wireless Capsule Endoscopy Lesion Classification and Localization

Article 04 February 2022

Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset

Article Open access 28 April 2024

Data Availibility

The data used in this study are included in the paper and are openly available at https://osf.io/dv2ag/.

References

Organization WH et al (2018) Malnutrition. key facts. World Health Organization, 1–7
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer Journal for Clinicians 68(6):394–424. https://doi.org/10.3322/caac.21492
Iddan G, Meron G, Glukhovsky A, Swain P (2000) Wireless capsule endoscopy. Nature 405(6785):417–417. https://doi.org/10.1038/35013140
Article Google Scholar
Jia X, Xing X, Yuan Y, Xing L, Meng MQ-H (2019) Wireless capsule endoscopy: a new tool for cancer screening in the colon with deep-learning-based polyp recognition. Proceedings of the IEEE 108(1):178–197. https://doi.org/10.1109/JPROC.2019.2950506
Article Google Scholar
Omori T, Hara T, Sakasai S, Kambayashi H, Murasugi S, Ito A, Nakamura S, Tokushige K (2018) Does the pillcam sb3 capsule endoscopy system improve image reading efficiency irrespective of experience? a pilot study. Endoscopy International Open 6(06):669–675. https://doi.org/10.1055/a-0599-5852
Article Google Scholar
Ye Y et al (2013) Bounds on rf cooperative localization for video capsule endoscopy. PhD thesis, Worcester Polytechnic Institute
Lafraxo S, El Ansari M, Koutti L (2023) Computer-aided system for bleeding detection in wce images based on cnn-gru network. Multimedia Tools and Applications 1–26. https://doi.org/10.1007/s11042-023-16305-w
Souaidi M, Lafraxo S, Kerkaou Z, El Ansari M, Koutti L (2023) A multiscale polyp detection approach for gi tract images based on improved densenet and single-shot multibox detector. Diagnostics 13(4):733. https://doi.org/10.3390/diagnostics13040733
Article Google Scholar
Khan MA, Sahar N, Khan WZ, Alhaisoni M, Tariq U, Zayyan MH, Kim YJ, Chang B (2022) Gestronet: a framework of saliency estimation and optimal deep learning features based gastrointestinal diseases detection and classification. Diagnostics 12(11):2718. https://doi.org/10.3390/diagnostics12112718
Article Google Scholar
Dheir IM, Abu-Naser SS (2022) Classification of anomalies in gastrointestinal tract using deep learning
Yuan Y, Li B, Meng MQ-H (2015) Improved bag of feature for automatic polyp detection in wireless capsule endoscopy images. IEEE Trans Auto Sci Eng 13(2):529–535. https://doi.org/10.1109/TASE.2015.2395429
Article Google Scholar
Yu L, Yuen PC, Lai J (2012) Ulcer detection in wireless capsule endoscopy images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 45–48. IEEE
Figueiredo IN, Kumar S, Leal C, Figueiredo PN (2013) Computer-assisted bleeding detection in wireless capsule endoscopy images. Comput Methods Biomech Biomed Eng: Imag Visual 1(4):198–210. https://doi.org/10.1080/21681163.2013.796164
Article Google Scholar
Ellahyani A, Jaafari IE, Charfi S, Ansari ME (2021) Detection of abnormalities in wireless capsule endoscopy based on extreme learning machine. Signal Image Video Proc 15(5):877–884. https://doi.org/10.1007/s11760-020-01809-x
Article Google Scholar
Deeba F, Bui FM, Wahid KA (2020) Computer-aided polyp detection based on image enhancement and saliency-based selection. Biomed Signal Proce Control 55:101530. https://doi.org/10.1016/j.bspc.2019.04.007
Article Google Scholar
Souaidi M, Abdelouahed AA, El Ansari M (2019) Multi-scale completed local binary patterns for ulcer detection in wireless capsule endoscopy images. Multimed Tools Appl 78:13091–13108. https://doi.org/10.1007/s11042-018-6086-2
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Cireşan DC, Giusti A, Gambardella LM, Schmidhuber J (2013) Mitosis detection in breast cancer histology images with deep neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 411–418. https://doi.org/10.1007/978-3-642-40763-5_51 Springer
Garbaz A, Lafraxo S, Charfi S, El Ansari M, Koutti L (2022) Bleeding classification in wireless capsule endoscopy images based on inception-resnet-v2 and cnns. In: 2022 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), pp 1–6. https://doi.org/10.1109/CIBCB55180.2022.9863010 IEEE
Cook D, Feuz KD, Krishnan NC (2013) Transfer learning for activity recognition: a survey. Knowledge and information systems 36(3):537–556. https://doi.org/10.1007/s10115-013-0665-3
Article Google Scholar
Dai Y, Gao Y, Liu F (2021) Transmed: transformers advance multi-modal medical image classification. Diagnostics 11(8):1384. https://doi.org/10.3390/diagnostics11081384
Article Google Scholar
He K, Gan C, Li Z, Rekik I, Yin Z, Ji W, Gao Y, Wang Q, Zhang J, Shen D (2023) Transformers in medical image analysis. Intelligent Medicine 3(1):59–78. https://doi.org/10.1016/j.imed.2022.07.002
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177IEEE
Stephane M (1999) A wavelet tour of signal processing. Elsevier. https://doi.org/10.1016/B978-0-12-374370-1.X0001-8
Article Google Scholar
Li B, Meng MQ-H (2012) Automatic polyp detection for wireless capsule endoscopy images. Expert Syst Appl 39(12):10952–10958. https://doi.org/10.1016/j.eswa.2012.03.029
Article Google Scholar
Charfi S, Ansari ME (2018) Computer-aided diagnosis system for colon abnormalities detection in wireless capsule endoscopy images. Multimed Tools Appl 77(3):4047–4064. https://doi.org/10.1007/s11042-017-4555-7
Article Google Scholar
Sainju S, Bui FM, Wahid K (2013) Bleeding detection in wireless capsule endoscopy based on color features from histogram probability. In: 2013 26th IEEE Canadian conference on electrical and computer engineering (CCECE), pp 1–4. https://doi.org/10.1109/CCECE.2013.6567779 . IEEE
Xing X, Jia X, Meng MQ-H (2018) Bleeding detection in wireless capsule endoscopy image video using superpixel-color histogram and a subspace knn classifier. In: 2018 40th Annual international conference of the ieee engineering in medicine and biology society (EMBC), pp 1–4. https://doi.org/10.1109/EMBC.2018.8513012IEEE
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, pp 144–152 . https://doi.org/10.1145/130385.130401
Zhu R, Zhang R, Xue D (2015) Lesion detection of endoscopy images based on convolutional neural network features. In: 2015 8th International congress on image and signal processing (CISP), pp 372–376. https://doi.org/10.1109/CISP.2015.7407907 IEEE
Sekuboyina AK, Devarakonda ST, Seelamantula CS (2017) A convolutional neural network approach for abnormality detection in wireless capsule endoscopy. In: 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017), pp 1057–1060. https://doi.org/10.1109/ISBI.2017.7950698 IEEE
Yu J-s, Chen J, Xiang Z, Zou Y-X (2015) A hybrid convolutional neural networks with extreme learning machine for wce image classification. In: 2015 IEEE international conference on robotics and biomimetics (ROBIO), pp 1822–1827. https://doi.org/10.1109/ROBIO.2015.7419037 IEEE
Seguí S, Drozdzal M, Pascual G, Radeva P, Malagelada C, Azpiroz F, Vitrià J (2016) Generic feature learning for wireless capsule endoscopy analysis. Comput Biol Med 79:163–172. https://doi.org/10.1016/j.compbiomed.2016.10.011
Article Google Scholar
Iakovidis DK, Georgakopoulos SV, Vasilakakis M, Koulaouzidis A, Plagianakos VP (2018) Detecting and locating gastrointestinal anomalies using deep learning and iterative cluster unification. IEEE Transactions on Medical Imaging 37(10):2196–2210. https://doi.org/10.1109/TMI.2018.2837002
Article Google Scholar
Goel N, Kaur S, Gunjan D, Mahapatra S (2022) Dilated cnn for abnormality detection in wireless capsule endoscopy images. Soft Comput 26(3):1231–1247. https://doi.org/10.1007/s00500-021-06546-y
Article Google Scholar
Yuan Y, Meng MQ-H (2017) Deep learning for polyp recognition in wireless capsule endoscopy images. Med Phys 44(4):1379–1389. https://doi.org/10.1002/mp.12147
Article Google Scholar
Khan MA, Khan MA, Ahmed F, Mittal M, Goyal LM, Hemanth DJ, Satapathy SC (2020) Gastrointestinal diseases segmentation and classification based on duo-deep architectures. Pattern Recogn Lett 131:193–204. https://doi.org/10.1016/j.patrec.2019.12.024
Article Google Scholar
Sharif M, Attique Khan M, Rashid M, Yasmin M, Afza F, Tanik UJ (2021) Deep cnn and geometric features-based gastrointestinal tract diseases detection and classification from wireless capsule endoscopy images. J Experim Theor Artif Intell 33(4):577–599. https://doi.org/10.1080/0952813X.2019.1572657
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Caroppo A, Leone A, Siciliano P (2021) Deep transfer learning approaches for bleeding detection in endoscopy images. Comput Med Imag Graphics 88:101852. https://doi.org/10.1016/j.compmedimag.2020.101852
Article Google Scholar
Oukdach Y, Kerkaou Z, El Ansari M, Koutti L, El Ouafdi AF (2022) Gastrointestinal diseases classification based on deep learning and transfer learning mechanism. In: 2022 9th international conference on wireless networks and mobile communications (WINCOM), pp 1–6. https://doi.org/10.1109/WINCOM55661.2022.9966474 IEEE
Souaidi M, El Ansari M (2022) A new automated polyp detection network mp-fssd in wce and colonoscopy images based fusion single shot multibox detector and transfer learning. IEEE Access 10:47124–47140. https://doi.org/10.1109/ACCESS.2022.3171238
Article Google Scholar
Zheng H, Chen H, Huang J, Li X, Han X, Yao J (2019) Polyp tracking in video colonoscopy using optical flow with an on-the-fly trained cnn. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pp 79–82. https://doi.org/10.1109/ISBI.2019.8759180 IEEE
Jain S, Seal A, Ojha A, Yazidi A, Bures J, Tacheci I, Krejcar O (2021) A deep cnn model for anomaly detection and localization in wireless capsule endoscopy images. Comput Biol Med 137:104789. https://doi.org/10.1016/j.compbiomed.2021.104789
Article Google Scholar
Lafraxo S, Souaidi M, El Ansari M, Koutti L (2023) Semantic segmentation of digestive abnormalities from wce images by using attresu-net architecture. Life 13(3):719. https://doi.org/10.3390/life13030719
Article Google Scholar
Iqbal I, Walayat K, Kakar MU, Ma J (2022) Automated identification of human gastrointestinal tract abnormalities based on deep convolutional neural network with endoscopic images. Intell Syst Appl 16:200149. https://doi.org/10.1016/j.iswa.2022.200149
Article Google Scholar
Lima DLS, Pessoa ACP, De Paiva AC, Silva Cunha AMT, Júnior GB, De Almeida JDS (2022) Classification of video capsule endoscopy images using visual transformers. In: 2022 IEEE-EMBS international conference on biomedical and health informatics (BHI), pp 1–4. https://doi.org/10.1109/BHI56158.2022.9926791 IEEE
Zhang Y, Liu H, Hu Q (2021) Transfuse: fusing transformers and cnns for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp 14–24. https://doi.org/10.1007/978-3-030-87193-2_2 Springer
Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D (2022) Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans Inst Measure 71:1–15. https://doi.org/10.1109/TIM.2022.3178991
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst, 30. arXiv:1706.03762
Smedsrud PH, Thambawita V, Hicks SA, Gjestang H, Nedrejord OO, Næss E, Borgli H, Jha D, Berstad TJD, Eskeland SL et al (2021) Kvasir-capsule, a video capsule endoscopy dataset. Sci Data 8(1):142. https://doi.org/10.1038/s41597-021-00920-z
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. arXiv:1512.03385
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258. arXiv:1610.02357
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708. arXiv:1608.06993
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710. arXiv:1707.07012
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 31. https://doi.org/10.1609/aaai.v31i1.11231
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114. arXiv:1905.11946 PMLR
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626. arXiv:1610.02391
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826. arXiv:1512.00567
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520. arXiv:1801.04381
Jain S, Seal A, Ojha A (2022) A hybrid convolutional neural network with meta feature learning for abnormality detection in wireless capsule endoscopy images. arXiv:2207.09769
Jain S, Seal A, Ojha A, Krejcar O, Bureš J, Tachecí I, Yazidi A (2020) Detection of abnormality in wireless capsule endoscopy images using fractal features. Computers in biology and medicine 127:104094. https://doi.org/10.1016/j.compbiomed.2020.104094
Article Google Scholar
Bernal J, Sánchez FJ, Fernández-Esparrach G, Gil D, Rodríguez C, Vilariño F (2015) Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput Med Imaging Graphics 43:99–111. https://doi.org/10.1016/j.compmedimag.2015.02.007
Article Google Scholar

Download references

Funding

This work was supported by the Ministry of National Education by Vocational Training; in part by the Higher Education and Scientific Research through the Ministry of Industry, Trade, and Green and Digital Economy; in part by the Digital Development Agency (ADD); and in part by the National Center for Scientific and Technical Research (CNRST) under Project ALKHAWARIZMI/2020/20.

Author information

Authors and Affiliations

LabSIV, Department of Computer Science, Faculty of Sciences, Ibnou Zohr University, B.P 8106, Agadir, 80000, Morocco
Yassine Oukdach, Zakaria Kerkaou, Lahcen Koutti & Ahmed Fouad El Ouafdi
Informatics and Applications Laboratory, Department of Computer Sciences, Faculty of Sciences, Moulay Ismail University, B.P 11201, Meknès, 52000, Morocco
Mohamed El Ansari
Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
Thomas De Lange
Medical Department - Mölndal, Sahlgrenska University Hospital, Region Västra Götaland, Sweden
Thomas De Lange

Authors

Yassine Oukdach
View author publications
You can also search for this author in PubMed Google Scholar
Zakaria Kerkaou
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El Ansari
View author publications
You can also search for this author in PubMed Google Scholar
Lahcen Koutti
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Fouad El Ouafdi
View author publications
You can also search for this author in PubMed Google Scholar
Thomas De Lange
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.O., Z.K., M.E., L.K., and A.F.E. wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yassine Oukdach.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Oukdach, Y., Kerkaou, Z., El Ansari, M. et al. ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-18039-1

Download citation

Received: 15 July 2023
Revised: 27 September 2023
Accepted: 26 December 2023
Published: 11 January 2024
DOI: https://doi.org/10.1007/s11042-023-18039-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism

Abstract

Access this article

Similar content being viewed by others

Attention Aware Deep Learning Model for Wireless Capsule Endoscopy Lesion Classification and Localization

Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism

Abstract

Access this article

Similar content being viewed by others

Attention Aware Deep Learning Model for Wireless Capsule Endoscopy Lesion Classification and Localization

Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset

Data Availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation