Skip to main content

Multimodal Emotion Recognition Using Attention-Based Model with Language, Audio, and Video Modalities

  • Conference paper
  • First Online:
Data Science and Emerging Technologies (DaSET 2023)

Abstract

Multimodal emotion identification is becoming increasingly important in human–computer interaction due to the amount of emotional information in human communication. Multimodal emotion recognition is the technique of simultaneously considering several modalities to boost accuracy and robustness. As emotion identification studies become more vital to human–computer interactions, automatic emotion detection systems become increasingly necessary. However, a lack of data presents a problem for multimodal emotion identification. To address this issue, we suggest employing transfer learning, which uses pretrained models such as RoBERTa and attention-based mechanisms such as self-attention to extract relevant features from multiple modalities and multi-head attention to fuse data across modalities. The aim of this paper is to provide a strategy for reliably forecasting emotions in audio, visual, and text by merging and complementing aspects traditionally handled by humans with those typically handled by deep learning. During the study, three popular multimodal emotion recognition datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI, are analyzed and ranked based on their quality. This study will help in constructing the network with the right amount of focus placed on each feature modality by creating an architecture that efficiently combines textual characteristics retrieved from RoBERTa with other modality-based features. A model better than BERT is introduced as part of this work that helps to improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multimodal emotion recognition with transformer-based self-supervised feature fusion. IEEE Access 8:176274–176285

    Article  Google Scholar 

  2. Lee S, Han DK, Ko H (2021) Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification. IEEE Access 9:94557–94572

    Article  Google Scholar 

  3. Ogura Y, Parsons WH, Kamat SS, Cravatt BF (2017) 乳鼠心肌提取 HHS public access. Phys Behav 17610:139–148. file:///C:/Users/Carla Carolina/Desktop/Artigos para acrescentar na qualificação/The impact of birth weight on cardiovascular disease risk in the.pdf.

    Google Scholar 

  4. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V, Allen PG (2022) RoBERTa: a robustly optimized BERT pretraining approach. https://github.com/pytorch/fairseq. Last accessed 02 Aug 2022

  5. Anon bert (2022) https://arxiv.org/pdf/1810.04805.pdf&usg=ALkJrhhzxlCL6yTht2BRmH9atgvKFxHsxQ. Last accessed 01 Aug 2022

  6. Bucur B, Somfelean I, Ghiurutan A, Lemnaru C, Dinsoreanu M (2018) An early fusion approach for multimodal emotion recognition using deep recurrent networks. In: Proceedings—2018 IEEE 14th International conference on intelligent computer communication and processing, ICCP 2018, pp 71–78 (2018)

    Google Scholar 

  7. Su H, Liu B, Tao J, Dong Y, Huang J, Lian Z, Song L (2020) An improved multimodal dimension emotion recognition based on different fusion methods. In: International conference on signal processing proceedings, ICSP, 2020-Decem, pp 257–261

    Google Scholar 

  8. Song KS, Nho YH, Seo JH, Kwon DS (2018) Decision-level fusion method for emotion recognition using multimodal emotion recognition information. In: 15th International conference on ubiquitous robots, UR 2018, pp 472–476

    Google Scholar 

  9. Xu M, Zhang F, Zhang W (2021) Head fusion: improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS dataset. IEEE Access 9:74539–74549

    Article  Google Scholar 

  10. Zhao Z, Wang Y, Wang Y (2022) Multi-level fusion of Wav2vec 2.0 and BERT for multimodal emotion recognition

    Google Scholar 

  11. Wiles O, Sophia Koepke A, Zisserman A (2018) Self-supervised learning of a facial attribute embedding from video. In: British Machine Vision Conference 2018, BMVC

    Google Scholar 

  12. Wang T, Hou Y, Zhou D, Zhang Q (2021) A contextual attention network for multimodal emotion recognition in conversation. In: Proceedings of the international joint conference on neural networks, 2021-July

    Google Scholar 

  13. Pham H, Liang PP, Manzini T, Morency LP, Póczos B (2019) Found in translation: Learning robust joint representations by cyclic translations between modalities. In: 33rd AAAI conference on artificial intelligence, AAAI 2019, 31st innovative applications of artificial intelligence conference, IAAI 2019 and the 9th AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Shaffer 2018, pp 6892–6899

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Disha Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, D., Jayabalan, M., Sultanova, N., Mustafina, J., Yao, D.N.L. (2024). Multimodal Emotion Recognition Using Attention-Based Model with Language, Audio, and Video Modalities. In: Bee Wah, Y., Al-Jumeily OBE, D., Berry, M.W. (eds) Data Science and Emerging Technologies. DaSET 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-97-0293-0_15

Download citation

Publish with us

Policies and ethics