Paper
2 May 2024 Vision transformer with pre-positional embedding
Takuro Eguchi, Yoshimitsu Kuroki
Author Affiliations +
Proceedings Volume 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024; 131640C (2024) https://doi.org/10.1117/12.3018012
Event: International Workshop on Advanced Imaging Technology (IWAIT) 2024, 2024, Langkawi, Malaysia
Abstract
Vision Transformer (ViT) is one of the neural network architectures applied to image processing based on Transformer. ViT has achieved State-Of-The-Art performances on various computer vision tasks. This study attempts to improve Input Layer of ViT by changing the way of positional embedding. We propose ViT with pre-positional embedding that adds constants to each pixel before dividing input images into patches. This method assumes the following image features: vertically asymmetric, horizontally symmetric, and distribution of similar features in an image extending concentrically from the center of the image. Experimental results demonstrate that the proposed method achieves the same image recognition accuracy as the conventional method with positional embedding while reducing the number of training parameters.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Takuro Eguchi and Yoshimitsu Kuroki "Vision transformer with pre-positional embedding", Proc. SPIE 13164, International Workshop on Advanced Imaging Technology (IWAIT) 2024, 131640C (2 May 2024); https://doi.org/10.1117/12.3018012
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Image classification

Neural networks

Computer vision technology

Image analysis

Image processing

Neurons

RELATED CONTENT


Back to Top