Skip to main content

MTGR: Improving Emotion and Sentiment Analysis with Gated Residual Networks

  • Conference paper
  • First Online:
Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges (ICPR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13643))

Included in the following conference series:

Abstract

In this paper, we address the problem of emotion recognition and sentiment analysis. Implementing an end-to-end deep learning model for emotion recognition or sentiment analysis that uses different modalities of data has become an emerging research area. Numerous research studies have shown that multimodal transformers can efficiently combine and integrate different heterogeneous modalities of data, and improve the accuracy of emotion/sentiment prediction. Therefore, in this paper, we propose a new multimodal transformer for sentiment analysis and emotion recognition. Compared to previous work, we propose to integrate a gated residual network (GRN) into the multimodal transformer to better capitalize on the various signal modalities. Our method shows an improvement of the F1 score and the accuracy results on the CMU-MOSI and IEMOCAP datasets compared to the state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Morency, L.-P., Mihalcea, R., Doshi. P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: International Conference on Multimodal Interfaces (ICMI 2011). Alicante, Spain, Nov. (2011)

    Google Scholar 

  2. V. Pérez-Rosas, V., Mihalcea, R., Morency, L.-P.: Utterance-level multimodal sentiment analysis. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Sofia, Bulgaria: Association for Computational Linguistics, Aug. 2013, pp. 973–982. https://aclanthology.org/P13-1096

  3. Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). (2016), pp. 439–448. https://doi.org/10.1109/ICDM.2016.0055

  4. Wang, H., Meghawat, A., Morency, L., Xing, E.P.: Select-Additive Learning: improving cross-individual generalization in multimodal sentiment analysis. In: CoRR abs/1609.05244 (2016). arXiv: 1609.05244

  5. Zadeh, A., Zellers, R., Pincus, E., Morency, L.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. In: CoRR abs/1606.06259 (2016). arXiv: 1606.06259

  6. Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019), Florence, Italy, Jul 28- Aug 2, 2019, Volume 1: Long Papers. 2019, pp. 6558–6569. https://doi.org/10.18653/v1/p19-1656

  7. Dobrišek, S., Gajšek, R., Mihelic, F., Pavešic, N., Štruc. V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Robot. Syst. 10.1, 53 (2013)

    Google Scholar 

  8. Li, B., Li, C., Duan, F., Zheng, N., Zhao. Q.: TPFN: applying outer product along time to multimodal sentiment analysis fusion on incomplete data. In: Computer Vision - ECCV 2020–16th European Conference, Glasgow, UK, Aug 23–28, 2020, Proceedings, Part XXIV. (2020), pp. 431–447. https://doi.org/10.1007/978-3-030-58586-0_26

  9. Zadeh, A., Liang, P.P., Poria, S., Cambria, E., Morency. L.: Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL (2018), Melbourne, Australia, Jul 15–20, 2018, Volume 1: Long Papers. 2018, pp. 2236–2246. https://doi.org/10.18653/v1/P18-1208, https://aclanthology.org/P18-1208/

  10. Wang, Y., Shen, Y., Liu, Z., Liang, P.P., Zadeh, A., Morency, L.-P.: Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: AAAI, pp. 7216–7223 (2019). https://doi.org/10.1609/aaai.v33i01.33017216

  11. Delbrouck, J., Tits, N., Brousmiche, M., Dupont, S.: A transformerbased joint-encoding for emotion recognition and sentiment analysis. In: CoRR abs/2006.15955 (2020). arXiv: 2006.15955

  12. Lim, B., Arik, S.O., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting (2020). arXiv:1912.09363 [stat.ML]

  13. Savarese, P., Figueiredo, D.: Residual gates: a simple mechanism for improved network optimization. In: Proceedings of the International Conference on Learning Representations (2017)

    Google Scholar 

  14. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. In: Lang. Resour. Evaluation 42.4 (2008), pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6

  15. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  16. iMotions. https://imotions.com/ (2017)

  17. Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP-A collaborative voice analysis repository for speech technologies. In: IEEE International Conference on Acoustics, Speech And Signal Processing (icassp), vol. 2014, pp. 960–964. IEEE (2014)

    Google Scholar 

  18. Delbrouck, J.-B., Tits, N., Brousmiche, M., Dupont, S.: A transformerbased joint-encoding for emotion recognition and sentiment analysis. In: Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML) (2020). https://doi.org/10.18653/v1/2020.challengehml-1.1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rihab Hajlaoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hajlaoui, R., Bilodeau, GA., Rockemann, J. (2023). MTGR: Improving Emotion and Sentiment Analysis with Gated Residual Networks. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37660-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37659-7

  • Online ISBN: 978-3-031-37660-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics