A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification | IEEE Conference Publication | IEEE Xplore