Vision transformer with pre-positional embedding

Takuro Eguchi; Yoshimitsu Kuroki

doi:10.1117/12.3018012

2 May 2024 Vision transformer with pre-positional embedding

Takuro Eguchi, Yoshimitsu Kuroki

Proceedings Volume 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024; 131640C (2024) https://doi.org/10.1117/12.3018012
Event: International Workshop on Advanced Imaging Technology (IWAIT) 2024, 2024, Langkawi, Malaysia

Abstract

Vision Transformer (ViT) is one of the neural network architectures applied to image processing based on Transformer. ViT has achieved State-Of-The-Art performances on various computer vision tasks. This study attempts to improve Input Layer of ViT by changing the way of positional embedding. We propose ViT with pre-positional embedding that adds constants to each pixel before dividing input images into patches. This method assumes the following image features: vertically asymmetric, horizontally symmetric, and distribution of similar features in an image extending concentrically from the center of the image. Experimental results demonstrate that the proposed method achieves the same image recognition accuracy as the conventional method with positional embedding while reducing the number of training parameters.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Takuro Eguchi and Yoshimitsu Kuroki "Vision transformer with pre-positional embedding", Proc. SPIE 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024, 131640C (2 May 2024); https://doi.org/10.1117/12.3018012

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available