Balanced sentimental information via multimodal interaction model

Luo, Yuanyi; Wu, Rui; Liu, Jiafeng; Tang, Xianglong

doi:10.1007/s00530-023-01208-5

Balanced sentimental information via multimodal interaction model

Regular Paper
Published: 13 January 2024

Volume 30, article number 10, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yuanyi Luo¹,
Rui Wu¹,
Jiafeng Liu¹ &
…
Xianglong Tang¹

281 Accesses
Explore all metrics

Abstract

Multimodal sentiment analysis can combine various types of modal information to make joint task decisions. In our experiment, however, we find that when the modalities in a sample contain different sentiment information, this sample negatively affects the accuracy of the overall analysis task. We attribute this problem to multimodal information imbalance. To resolve this problem, a multimodal interaction model (MIM) is proposed. In this paper, we use cross-attention to make the information among different modalities fully interactive and demonstrate the role of cross-attention in unimodal representation learning. Additionally, we use a subspace to learn specific features with the aims of reducing the redundancy of modal information and improving the effectiveness of the information interaction process. The proposed model is compared with baselines on the MOSI and MOSEI multimodal sentiment analysis datasets. The experimental results show that the proposed model achieves superior performance, which proves the effectiveness of our model in multimodal sentiment analysis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Availability of data and materials

The datasets analyzed during the current study are available in the MMSA repository. [http://immortal.multicomp.cs.cmu.edu/raw_datasets/processed_data/ or https://github.com/thuiar/Self-MM].

References

Cao, D., Ji, R., Lin, D., et al.: A cross-media public sentiment analysis system for microblog. Multimed. Syst. 22(4), 479–486 (2016)
Article Google Scholar
Sharma, A., Sharma, K., Kumar, A.: Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion. Neural Comput. Appl. 35(31), 22935–48 (2023)
Article Google Scholar
Baltrušaiitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intel. 41(2), 423–443 (2018)
Article Google Scholar
Yu, W., Xu, H., Meng, F., et al.: Ch-sims: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 3718–3727 (2020)
Zadeh, A., Chen, M., Poria, S., et al.: Tensor fusion network for multimodal sentiment analysis (2017). arXiv:1707.07250
Liu, Z., Shen, Y., Lakshminarasimhan, V.B., et al.: Efficient low-rank multimodal fusion with modality-specific factors (2018). arXiv:1806.00064
Zadeh, A., Liang, P.P., Mazumder, N., et al.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32, no 1 (2018)
Ghosal, D., Akhtar, M.S., Chauhan, D., et al.: Contextual inter-modal attention for multi-modal sentiment analysis. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3454–3466 (2018)
Long, X., Gan, C., Melo, G., et al.: Multimodal keyless attention fusion for video classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Tsai, Y.H.H., Ma, M.Q., Yang, M., et al.: Multimodal routing: improving local and global interpretability of multimodal language analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, p. 1823. NIH Public Access (2020)
Sahay, S., Kumar, S.H., Xia, R., et al.: Multimodal relational tensor network for sentiment and emotion classification (2018). arXiv:1806.02923
Tsai, Y.H.H., Bai, S., Liang, P.P., et al.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the Conference Association for Computational Linguistics Meeting, p. 6558. NIH Public Access (2019)
Hazarika, D., Zimmermann, R., Poria, S.: Misa: modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131 (2020)
Rahman, W., Hasan, M.K., Lee, S., et al.: Integrating multimodal information in large pretrained transformers. In: Proceedings of the conference Association for Computational Linguistics Meeting, p. 2359. NIH Public Access (2020)
Yu, W., Xu, H., Yuan, Z., et al.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis (2021). arXiv:2102.04830
Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 299–307 (2017)
Tsai, Y.H.H, Liang, P.P., Zadeh, A., et al.: Learning factorized multimodal representations (2018). arXiv:1806.06176
Zadeh, A., Liang, P.P.: Poria S, et al.: Multi-attention recurrent network for human communication comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Yang, J., Zhang, C., Tang, Y., et al.: PAFM: pose-drive attention fusion mechanism for occluded person re-identification. Neural Comput. Appl. 34(10), 8241–8252 (2022)
Article Google Scholar
Zhang, C., Li, Z., Wang, Z.: Joint compressive representation for multi-feature tracking. Neurocomputing 299, 32–41 (2018)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473
Yadav, A., Vishwakarma, D.K.: A deep learning architecture of RA-DLNet for visual sentiment analysis. Multimed. Syst. 26(4), 431–451 (2020)
Article Google Scholar
Xu, K., Ba. J., Kiros. R., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning. PMLR, pp. 2048–2057 (2015)
Peng, X., Wei, Y., Deng, A., et al.: Balanced multimodal learning via on-the-fly gradient modulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8238–8247 (2022)
Zadeh, A., Zellers, R., Pincus, E., et al.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016)
Article Google Scholar
Zadeh, A., Pu, P.: Multimodal language analysis in the wild: CMU-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers) (2018)
Li, Y., Zhang, K., Wang, J., et al.: A cognitive brain model for multimodal sentiment analysis based on attention neural networks. Neurocomputing 430, 159–173 (2021)
Article Google Scholar

Download references

Funding

This research is supported by the National Natural Science Foundation of China (No: 61672190).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Yuanyi Luo, Rui Wu, Jiafeng Liu & Xianglong Tang

Authors

Yuanyi Luo
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xianglong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Wu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by X. Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Luo, Y., Wu, R., Liu, J. et al. Balanced sentimental information via multimodal interaction model. Multimedia Systems 30, 10 (2024). https://doi.org/10.1007/s00530-023-01208-5

Download citation

Received: 25 August 2022
Accepted: 08 December 2023
Published: 13 January 2024
DOI: https://doi.org/10.1007/s00530-023-01208-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balanced sentimental information via multimodal interaction model

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balanced sentimental information via multimodal interaction model

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation