Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | IEEE Conference Publication | IEEE Xplore