MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers | IEEE Conference Publication | IEEE Xplore