Abstract
In this paper, we address the problem of emotion recognition and sentiment analysis. Implementing an end-to-end deep learning model for emotion recognition or sentiment analysis that uses different modalities of data has become an emerging research area. Numerous research studies have shown that multimodal transformers can efficiently combine and integrate different heterogeneous modalities of data, and improve the accuracy of emotion/sentiment prediction. Therefore, in this paper, we propose a new multimodal transformer for sentiment analysis and emotion recognition. Compared to previous work, we propose to integrate a gated residual network (GRN) into the multimodal transformer to better capitalize on the various signal modalities. Our method shows an improvement of the F1 score and the accuracy results on the CMU-MOSI and IEMOCAP datasets compared to the state-of-the-art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Morency, L.-P., Mihalcea, R., Doshi. P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: International Conference on Multimodal Interfaces (ICMI 2011). Alicante, Spain, Nov. (2011)
V. Pérez-Rosas, V., Mihalcea, R., Morency, L.-P.: Utterance-level multimodal sentiment analysis. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Sofia, Bulgaria: Association for Computational Linguistics, Aug. 2013, pp. 973–982. https://aclanthology.org/P13-1096
Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). (2016), pp. 439–448. https://doi.org/10.1109/ICDM.2016.0055
Wang, H., Meghawat, A., Morency, L., Xing, E.P.: Select-Additive Learning: improving cross-individual generalization in multimodal sentiment analysis. In: CoRR abs/1609.05244 (2016). arXiv: 1609.05244
Zadeh, A., Zellers, R., Pincus, E., Morency, L.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. In: CoRR abs/1606.06259 (2016). arXiv: 1606.06259
Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019), Florence, Italy, Jul 28- Aug 2, 2019, Volume 1: Long Papers. 2019, pp. 6558–6569. https://doi.org/10.18653/v1/p19-1656
Dobrišek, S., Gajšek, R., Mihelic, F., Pavešic, N., Štruc. V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Robot. Syst. 10.1, 53 (2013)
Li, B., Li, C., Duan, F., Zheng, N., Zhao. Q.: TPFN: applying outer product along time to multimodal sentiment analysis fusion on incomplete data. In: Computer Vision - ECCV 2020–16th European Conference, Glasgow, UK, Aug 23–28, 2020, Proceedings, Part XXIV. (2020), pp. 431–447. https://doi.org/10.1007/978-3-030-58586-0_26
Zadeh, A., Liang, P.P., Poria, S., Cambria, E., Morency. L.: Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL (2018), Melbourne, Australia, Jul 15–20, 2018, Volume 1: Long Papers. 2018, pp. 2236–2246. https://doi.org/10.18653/v1/P18-1208, https://aclanthology.org/P18-1208/
Wang, Y., Shen, Y., Liu, Z., Liang, P.P., Zadeh, A., Morency, L.-P.: Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: AAAI, pp. 7216–7223 (2019). https://doi.org/10.1609/aaai.v33i01.33017216
Delbrouck, J., Tits, N., Brousmiche, M., Dupont, S.: A transformerbased joint-encoding for emotion recognition and sentiment analysis. In: CoRR abs/2006.15955 (2020). arXiv: 2006.15955
Lim, B., Arik, S.O., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting (2020). arXiv:1912.09363 [stat.ML]
Savarese, P., Figueiredo, D.: Residual gates: a simple mechanism for improved network optimization. In: Proceedings of the International Conference on Learning Representations (2017)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. In: Lang. Resour. Evaluation 42.4 (2008), pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
iMotions. https://imotions.com/ (2017)
Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP-A collaborative voice analysis repository for speech technologies. In: IEEE International Conference on Acoustics, Speech And Signal Processing (icassp), vol. 2014, pp. 960–964. IEEE (2014)
Delbrouck, J.-B., Tits, N., Brousmiche, M., Dupont, S.: A transformerbased joint-encoding for emotion recognition and sentiment analysis. In: Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML) (2020). https://doi.org/10.18653/v1/2020.challengehml-1.1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Hajlaoui, R., Bilodeau, GA., Rockemann, J. (2023). MTGR: Improving Emotion and Sentiment Analysis with Gated Residual Networks. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13643. Springer, Cham. https://doi.org/10.1007/978-3-031-37660-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-37660-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37659-7
Online ISBN: 978-3-031-37660-3
eBook Packages: Computer ScienceComputer Science (R0)