research-article

Endoscopic Image Classification using Vision Transformers

Authors:
Preeti Bissoonauth-Daiboo

University of Mauritius, Mauritius

University of Mauritius, Mauritius

0000-0002-7130-1869
View Profile

,
Maleika Heenaye-Mamode Khan

University of Mauritius, Mauritius

University of Mauritius, Mauritius

0000-0001-9299-1152
View Profile

,
Muzzammil Muhammad Auzine

University of Mauritius, Mauritius

University of Mauritius, Mauritius

0000-0002-6616-7108
View Profile

,
Xiaohong Gao

Middlesex University, United Kingdom

Middlesex University, United Kingdom

0000-0002-8103-6624
View Profile

,
Sunilduth Baichoo

University of Mauritius, Mauritius

University of Mauritius, Mauritius

0000-0001-9543-8328
View Profile

,
Zaid Heetun

Center for Gastroenterology and Hepatology, Dr Abdool Gaffoor Jeetoo Hospital, Mauritius

Center for Gastroenterology and Hepatology, Dr Abdool Gaffoor Jeetoo Hospital, Mauritius

0009-0009-6855-5399
View Profile

ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial IntelligenceOctober 2023Pages 128–132https://doi.org/10.1145/3633598.3633623

Published:22 January 2024Publication History

ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence

Pages 128–132

ABSTRACT

Convolutional Neural Networks (CNNs) have been the state-of-the-art techniques applied in the field of medical imaging for numerous image processing tasks. Recently, vision transformer networks are emerging as another technique, complementing current CNNs in the medical field providing on-par performance while also having a number of unique characteristics that may be useful for medical image processing. While CNNs have been predominantly applied to artefact detection and classification in endoscopic images, ViT has been sparsely applied in this area. Additionally, both CNN and ViT have been sparingly applied to colour misalignment artefact classification. In this work, we, therefore, explore the application of Vision Transformer (ViT) in the classification of artefacts in endoscopic images of the gastrointestinal tract organs. Furthermore, the performance of ViT is compared to that of CNN in the classification of colour misalignment artefacts. Our customised ViT model, based on DeiT (Data-efficient image Transformers), has obtained an accuracy of 96.33% as compared to the CNN based Inceptionv3 model with an accuracy of 78.67% and InceptionResNetv2 with 76.67%. The results demonstrate that when pretrained on ImageNet, ViT offer better performance than CNNs in colour misalignment artefact classification. This is due to the ability of ViT to better depict the relationship between image pixels through self-attention weights. Moreover, the built-in self-attention mechanism offers fresh insight into the decision-making processes of the model.

References

Sharib Ali, Mariia Dmitrieva, Noha Ghatwary, Sophia Bano, Gorkem Polat, Alptekin Temizel, Adrian Krenzer, Amar Hekalo, Yun Bo Guo, Bogdan Matuszewski, Mourad Gridach, Irina Voiculescu, Vishnusai Yoganand, Arnav Chavan, Aryan Raj, Nhan T. Nguyen, Dat Q. Tran, Le Duy Huynh, Nicolas Boutry, Shahadate Rezvy, Haijian Chen, Yoon Ho Choi, Anand Subramanian, Velmurugan Balasubramanian, Xiaohong W. Gao, Hongyu Hu, Yusheng Liao, Danail Stoyanov, Christian Daul, Stefano Realdon, Renato Cannizzaro, Dominique Lamarque, Terry Tran-Nguyen, Adam Bailey, Barbara Braden, James East, and Jens Rittscher. 2021. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Medical Image Analysis 70 (May 2021), 102002. https://doi.org/10.1016/j.media.2021.102002 arXiv:2010.06034 [cs].Google ScholarCross Ref
Sharib Ali, Felix Zhou, Christian Daul, Barbara Braden, Adam Bailey, Stefano Realdon, James East, Georges Wagnières, Victor Loschenov, Enrico Grisan, Walter Blondel, and Jens Rittscher. 2019. Endoscopy artifact detection (EAD 2019) challenge dataset. https://doi.org/10.17632/C7FJBXCGJ9.1 arXiv:1905.03209 [cs, eess].Google ScholarCross Ref
Mayank Banoula. [n. d.]. What Is Deep Learning? | How It Works, Techniques & Applications.Google Scholar
datagen.tech. [n. d.]. ResNet-50: The Basics and a Quick Tutorial.Google Scholar
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://openreview.net/forum?id=YicbFdNTTyGoogle Scholar
Xiaohong Gao, Barbara Braden, Stephen Taylor, and Wei Pang. 2019. Towards Real-Time Detection of Squamous Pre-Cancers from Oesophageal Endoscopic Videos. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). 1606–1612. https://doi.org/10.1109/ICMLA.2019.00264Google ScholarCross Ref
Xiaohong W. Gao, Stephen Taylor, Wei Pang, Rui Hui, Xin Lu, and Barbara Braden. 2023. Fusion of colour contrasted images for early detection of oesophageal squamous cell dysplasia from endoscopic videos in real time. Information Fusion 92 (April 2023), 64–79. https://doi.org/10.1016/j.inffus.2022.11.023Google ScholarDigital Library
Behnaz Gheflati and Hassan Rivaz. 2022. Vision Transformer for Classification of Breast Ultrasound Images. https://doi.org/10.48550/arXiv.2110.14731 arXiv:2110.14731 [cs].Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90 ISSN: 1063-6919.Google ScholarCross Ref
Christos Matsoukas, Johan Haslum, Magnus Soderberg, and Kevin Smith. 2021. Is it Time to Replace CNNs with Transformers for Medical Images?Google Scholar
Ken Namikawa, Toshiaki Hirasawa, Toshiyuki Yoshio, Junko Fujisaki, Tsuyoshi Ozawa, Soichiro Ishihara, Tomonori Aoki, Atsuo Yamada, Kazuhiko Koike, Hideo Suzuki, and Tomohiro Tada. 2020. Utilizing artificial intelligence in endoscopy: a clinician’s guide. Expert Review of Gastroenterology & Hepatology 14, 8 (Aug. 2020), 689–706. https://doi.org/10.1080/17474124.2020.1779058Google ScholarCross Ref
Nhan T. Nguyen, Dat Q. Tran, and Dung B. Nguyen. 2020. Detection and Segmentation of Endoscopic Artefacts and Diseases Using Deep Architectures. https://doi.org/10.1101/2020.04.17.20070201 Pages: 2020.04.17.20070201.Google ScholarCross Ref
Ilkay Oksuz, James R. Clough, James R. Clough, and Julia A. Schnabel. 2019. Artefact detection in video endoscopy using retinanet and focal loss function. CEUR Workshop Proceedings 2366 (2019). http://www.scopus.com/inward/record.url?scp=85066467552&partnerID=8YFLogxKGoogle Scholar
Shehan Perera, Srikar Adhikari, and Alper Yilmaz. 2021. POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound. https://doi.org/10.48550/arXiv.2105.09913 arXiv:2105.09913 [cs, eess].Google ScholarCross Ref
Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, and Huazhu Fu. 2022. Transformers in Medical Imaging: A Survey. https://doi.org/10.48550/arXiv.2201.09873 arXiv:2201.09873 [cs, eess].Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/arXiv.1409.1556 arXiv:1409.1556 [cs].Google ScholarCross Ref
Hyuna Sung, Jacques Ferlay, Rebecca L. Siegel, Mathieu Laversanne, Isabelle Soerjomataram, Ahmedin Jemal, and Freddie Bray. 2021. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: a cancer journal for clinicians 71, 3 (May 2021), 209–249. https://doi.org/10.3322/caac.21660Google ScholarCross Ref
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. https://doi.org/10.48550/arXiv.1602.07261 arXiv:1602.07261 [cs].Google ScholarCross Ref
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. https://doi.org/10.48550/arXiv.1512.00567 arXiv:1512.00567 [cs].Google ScholarCross Ref
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. https://doi.org/10.48550/arXiv.2012.12877 arXiv:2012.12877 [cs].Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle Scholar
VideoProc. [n. d.]. [OFFICIAL]VideoProc Converter - One-Stop Video Processing Software for Windows Mac. https://www.videoproc.com/Google Scholar
Lianlian Wu, Wei Zhou, Xinyue Wan, Jun Zhang, Lei Shen, Shan Hu, Qianshan Ding, Ganggang Mu, Anning Yin, Xu Huang, Jun Liu, Xiaoda Jiang, Zhengqiang Wang, Yunchao Deng, Mei Liu, Rong Lin, Tingsheng Ling, Peng Li, Qi Wu, Peng Jin, Jie Chen, and Honggang Yu. 2019. A deep neural network improves endoscopic detection of early gastric cancer without blind spots. Endoscopy 51, 6 (June 2019), 522–531. https://doi.org/10.1055/a-0855-3532Google ScholarCross Ref
Suhui Yang and G. Cheng. 2019. ENDOSCOPIC ARTEFACT DETECTION AND SEGMENTATION WITH DEEP CONVOLUTIONAL NEURAL NETWORK. https://www.semanticscholar.org/paper/ENDOSCOPIC-ARTEFACT-DETECTION-AND-SEGMENTATION-WITH-Yang-Cheng/57c589a70e3dd1b9fcb57ccd7361387ddfc3e8edGoogle Scholar

Index Terms

Endoscopic Image Classification using Vision Transformers
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Local and Global Feature Interaction Network for Endoscope Image Classification
Image and Graphics
Abstract
Convolutional Neural Network (CNN) shows great performance in the field of endoscopic image classification in past few years. It can capture local features of endoscopic images, but it fails to exploit global semantic information. Recently ...
Read More
Towards Efficient Adversarial Training on Vision Transformers
Computer Vision – ECCV 2022
Abstract
Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is ...
Read More
End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers
Artificial Neural Networks and Machine Learning – ICANN 2022
Abstract
There has been significant progress in content-based image retrieval with the development of convolutional neural networks and visual transformers. However, there are semantic gaps between high-level semantic information and low-level visual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence
October 2023
151 pages
ISBN:9798400708985
DOI:10.1145/3633598

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Artefact Classification
CNN
Colour Misalignment
Vision Transformer
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 8
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Endoscopic Image Classification using Vision Transformers

ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Local and Global Feature Interaction Network for Endoscope Image Classification

Towards Efficient Adversarial Training on Vision Transformers

End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Endoscopic Image Classification using Vision Transformers

ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Local and Global Feature Interaction Network for Endoscope Image Classification

Towards Efficient Adversarial Training on Vision Transformers

End-to-End Large-Scale Image Retrieval Network with Convolution and Vision Transformers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media