research-article

Open Access

Discovering Relevant Sub-spaces of BERT, Wav2Vec 2.0, ELECTRA and ViT Embeddings for Humor and Mimicked Emotion Recognition with Integrated Gradients

Authors:
Tamás Grósz

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland

0000-0001-7918-9579
View Profile

,
Anja Virkkunen

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland

0000-0002-7249-3743
View Profile

,
Dejan Porjazovski

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland

0000-0002-7219-9042
View Profile

,
Mikko Kurimo

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland

0000-0001-5278-7974
View Profile

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and PersonalisationNovember 2023Pages 27–34https://doi.org/10.1145/3606039.3613102

Published:29 October 2023Publication History

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

Pages 27–34

ABSTRACT

Large-scale, pre-trained models revolutionized the field of sentiment analysis and enabled multimodal systems to be quickly developed. In this paper, we address two challenges posed by the Multimodal Sentiment Analysis (MuSe) 2023 competition by focusing on automatically detecting cross-cultural humor and predicting three continuous emotion targets from user-generated videos. Multiple methods in the literature already demonstrate the importance of embedded features generated by popular pre-trained neural solutions. Based on their success, we can assume that the embedded space consists of several sub-spaces relevant to different tasks. Our aim is to automatically identify the task-specific sub-spaces of various embeddings by interpreting the baseline neural models. Once the relevant dimensions are located, we train a new model using only those features, which leads to similar or slightly better results with a considerably smaller and faster model. The best Humor Detection model using only the relevant sub-space of audio embeddings contained approximately 54% fewer parameters than the one processing the whole encoded vector, required 48% less time to be trained and even outperformed the larger model. Our empirical results validate that, indeed, only a portion of the embedding space is needed to achieve good performance. Our solution could be considered a novel form of knowledge distillation, which enables new ways of transferring knowledge from one model into another.

Supplemental Material

muse010-video.mp4

mp4

16.6 MB

Download

References

Shivaji Alaparthi and Manit Mishra. 2021. BERT: A sentiment analysis odyssey. Journal of Marketing Analytics , Vol. 9, 2 (2021), 118--126.Google ScholarCross Ref
Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Messner, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. In Proceedings of the 31st ACM International Conference on Multimedia (MM'23), October 29-November 2, 2023, Ottawa, Canada. Association for Computing Machinery, Ottawa, Canada. to appear.Google ScholarDigital Library
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems, , H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 12449--12460. https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdfGoogle Scholar
Yonatan Belinkov. 2022. Probing Classifiers: Promises, Shortcomings, and Advances. Computational Linguistics , Vol. 48, 1 (March 2022), 207--219. https://doi.org/10.1162/coli_a_00422Google ScholarCross Ref
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.Google ScholarCross Ref
Aayushi Chaudhari, Chintan Bhatt, Achyut Krishna, and Pier Luigi Mazzeo. 2022. ViTFER: Facial Emotion Recognition with Vision Transformers. Applied System Innovation , Vol. 5, 4 (2022). https://doi.org/10.3390/asi5040080Google ScholarCross Ref
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https://doi.org/10.3115/v1/D14--1179Google ScholarCross Ref
Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation. In MuSe'23: Proceedings of the 4th Multimodal Sentiment Analysis Workshop and Challenge. Association for Computing Machinery. co-located with ACM Multimedia 2022, to appear.Google Scholar
Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, and Björn W. Schuller. 2022. Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. arxiv: 2209.14272 [cs.LG]Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR. https://openreview.net/pdf?id=r1xMH1BtvBGoogle Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/v1/n19--1423Google ScholarCross Ref
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTyGoogle Scholar
Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).Google Scholar
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of Neural Networks Is Fragile. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 3681--3688. https://doi.org/10.1609/aaai.v33i01.33013681Google ScholarDigital Library
Tamás Grósz, Dejan Porjazovski, Yaroslav Getman, Sudarsana Kadiri, and Mikko Kurimo. 2022. Wav2vec2-Based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 7026--7029. https://doi.org/10.1145/3503161.3551572Google ScholarDigital Library
Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, and Mikko Kurimo. 2022. End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks. arxiv: 2210.15978 [eess.AS]Google Scholar
Utkarsh Mahadeo Khaire and R. Dhanalakshmi. 2022. Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, Vol. 34, 4 (2022), 1060--1073. https://doi.org/10.1016/j.jksuci.2019.06.012Google ScholarDigital Library
Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in Vision: A Survey. ACM Comput. Surv. , Vol. 54, 10s, Article 200 (sep 2022), bibinfonumpages41 pages. https://doi.org/10.1145/3505244Google ScholarDigital Library
Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, and Orion Reblitz-Richardson. 2020. Captum: A unified and generic model interpretability library for PyTorch. arxiv: 2009.07896 [cs.LG]Google Scholar
Ninghao Liu, Yunsong Meng, Xia Hu, Tie Wang, and Bo Long. 2020. Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for Post-Hoc Interpretability. ArXiv , Vol. abs/2009.07494 (2020).Google Scholar
Shuo Liu, Adria Mallol-Ragolta, Emilia Parada-Cabaleiro, Kun Qian, Xin Jing, Alexander Kathan, Bin Hu, and Björn W. Schuller. 2022. Audio self-supervised learning: A survey. Patterns, Vol. 3, 12 (2022), 100616. https://doi.org/10.1016/j.patter.2022.100616Google ScholarCross Ref
Ali Mirzaei, Vahid Pourahmadi, Mehran Soltani, and Hamid Sheikhzadeh. 2020. Deep feature selection using a teacher-student network. Neurocomputing , Vol. 383 (2020), 396--408. https://doi.org/10.1016/j.neucom.2019.12.017Google ScholarDigital Library
Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, and Sergey Edunov. 2019. Facebook FAIR's WMT19 News Translation Task Submission. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). Association for Computational Linguistics, Florence, Italy, 314--319. https://doi.org/10.18653/v1/W19--5333Google ScholarCross Ref
Gherman Novakovsky, Nick Dexter, Maxwell W Libbrecht, Wyeth W Wasserman, and Sara Mostafavi. 2023. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nature Reviews Genetics , Vol. 24, 2 (2023), 125--137.Google ScholarCross Ref
Leonardo Pepino, Pablo Riera, and Luciana Ferrer. 2021. Emotion Recognition from Speech Using wav2vec 2.0 Embeddings. In Proc. Interspeech 2021. 3400--3404. https://doi.org/10.21437/Interspeech.2021--703Google ScholarCross Ref
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, , Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194--206.Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 3319--3328.Google Scholar
Abhishek Velankar, Hrushikesh Patil, and Raviraj Joshi. 2022. Mono vs Multilingual BERT For Hate Speech Detection And Text Classification: A Case Study In Marathi. In Artificial Neural Networks in Pattern Recognition: 10th IAPR TC3 Workshop, ANNPR 2022, Dubai, United Arab Emirates, November 24--26, 2022, Proceedings (Dubai, United Arab Emirates). Springer-Verlag, Berlin, Heidelberg, 121--128. https://doi.org/10.1007/978--3-031--20650--4_10Google ScholarCross Ref
Giulia Vilone and Luca Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion , Vol. 76 (2021), 89--106. https://doi.org/10.1016/j.inffus.2021.05.009Google ScholarDigital Library
Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani Luotolahti, Tapio Salakoski, Filip Ginter, and Sampo Pyysalo. 2019. Multilingual is not enough: BERT for Finnish. arxiv: 1912.07076 [cs.CL]Google Scholar
Gregor Wiedemann, Steffen Remus, Avi Chawla, and Chris Biemann. 2019. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. ArXiv , Vol. abs/1909.10430 (2019).Google Scholar
Shijie Wu and Mark Dredze. 2020. Are All Languages Created Equal in Multilingual BERT?. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online, 120--130. https://doi.org/10.18653/v1/2020.repl4nlp-1.16Google ScholarCross Ref
Patrick Xia, Shijie Wu, and Benjamin Van Durme. 2020. Which *BERT? A Survey Organizing Contextualized Encoders. In Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
Ruicong Zhi, Mengyi Liu, and Dezheng Zhang. 2019. A comprehensive survey on automatic facial action unit analysis. The Visual Computer , Vol. 36 (2019), 1067--1093.Google ScholarDigital Library

Index Terms

Discovering Relevant Sub-spaces of BERT, Wav2Vec 2.0, ELECTRA and ViT Embeddings for Humor and Mimicked Emotion Recognition with Integrated Gradients
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

The 4th Multimodal Sentiment Analysis Challenge (MuSe) focuses on Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. The workshop takes place in conjunction with ACM Multimedia'23. We provide three ...
Read More
The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation
MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

The Multimodal Sentiment Analysis Challenge (MuSe) 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three ...
Read More
CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Multimodal sentiment analysis is an emerging research field that aims to enable machines to recognize, interpret, and express emotion. Through the cross-modal interaction, we can get more comprehensive emotional characteristics of the speaker. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation
November 2023
113 pages
ISBN:9798400702709
DOI:10.1145/3606039
General Chairs:
Shahin Amiriparian
University of Augsburg, Germany
,
Lukas Christ
University of Augsburg, Germany
,
Andreas Konig
University of Passau, Germany
,
Alan Cowen
Hume AI, USA
,
Eva-Maria Meßner
University of Ulm, Germany
,
Erik Cambria
Nanyang Technological University, Singapore
,
Bjorn W. Schuller
Imperial College London, UK
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2023
Check for updates
Author Tags
affective computing
feature selection
model interpretation
multimodal sentiment analysis
xai
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate14of17submissions,82%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 144
  Total Downloads
- Downloads (Last 12 months)144
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering Relevant Sub-spaces of BERT, Wav2Vec 2.0, ELECTRA and ViT Embeddings for Humor and Mimicked Emotion Recognition with Integrated Gradients

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects

The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Discovering Relevant Sub-spaces of BERT, Wav2Vec 2.0, ELECTRA and ViT Embeddings for Humor and Mimicked Emotion Recognition with Integrated Gradients

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects

The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media