Skip to main content
Log in

MACSA: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As a fundamental task of fine-grained sentiment analysis, Aspect-Category Sentiment Analysis (ACSA) aims to predict the sentiment polarities of sentences with respect to given aspect categories. Previous works on ACSA are text-based, but with the increase of multimodal user-generated content (e.g. text and image), multimodal fine-grained sentiment analysis has attracted more attention in recent years. However, most of the existing multimodal fine-grained sentiment analysis work focuses on analyzing the sentiment of aspects that explicitly exist in the textual content. And there has been rare work on multimodal sentiment analysis of implicit categories in multimodal data, due to the lack of a sufficient dataset. In this paper, we introduce a new task, named Multimodal Aspect-Category Sentiment Analysis (MACSA), with the goal of predicting sentiment polarities of image-text pairs with respect to given aspect categories. And we propose a novel Multimodal Graph-based Aligned Network (MGAM) model for this task. Our model constructs heterogeneous graphs through multimodal fine-grained information and uses a convolutional graph neural network to learn cross-modal fine-grained interaction. We provide a new multimodal aspect category sentiment dataset, named the Hotel-MACSA dataset to evaluate our model, which contains multimodal fine-grained aligned annotations. The experimental results demonstrate the effectiveness of our proposed MGAM model for this new task. The MGAM model achieves an accuracy of 86.06% on the Hotel-MACSA dataset and 75.25% on the hard version test dataset, outperforming all baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The datasets generated in our experiments are available from https://www.qunr.com. Accessed 12 February 2020. The datasets used or analyzed during the current study are publicly available at https://github.com/yhit98/Hotel-MACSA.

Notes

  1. https://github.com/fxsjy/jieba

References

  1. Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88

    Article  Google Scholar 

  2. Zadeh A, Pu P (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Long Papers)

  3. Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: Interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359

    Article  Google Scholar 

  4. Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv:1810.02508

  5. Yu W, Xu H, Meng F, Zhu Y, Ma Y, Wu J, Zou J, Yang K (2020) Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3718–3727

  6. Castro S, Hazarika D, Pérez-Rosas V, Zimmermann R, Mihalcea R, Poria S (2019) Towards multimodal sarcasm detection (an _obviously_ perfect paper). In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4619–4629

  7. Li Y, Tao J, Chao L, Bao W, Liu Y (2017) Cheavd: a chinese natural emotional audio-visual database. J Ambient Intell Humaniz Comput 8(6):913–924

    Article  Google Scholar 

  8. Morency L-P, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: Harvesting opinions from the web. In: Proceedings of the 13th international conference on multimodal interfaces, pp 169–176

  9. Pérez-Rosas V, Mihalcea R, Morency L.-P (2013) Utterance-level multimodal sentiment analysis. In: Proceedings of the 51st annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 973–982

  10. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: International conference on multimedia modeling, pp 15–27. Springer

  11. Truong Q-T, Lauw HW (2019) Vistanet: Visual aspect attention network for multimodal sentiment analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 305–312

  12. You Q, Cao L, Jin H, Luo J (2016) Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks. In: Proceedings of the 24th ACM international conference on multimedia, pp 1008–1017

  13. Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2506–2515

  14. Hasan MK, Rahman W, Zadeh AB, Zhong J, Tanveer MI, Morency L-P, Hoque ME (2019) Ur-funny: A multimodal language dataset for understanding humor. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2046–2056

  15. Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 371–378

  16. Yu J, Jiang J (2019) Adapting bert for target-oriented multimodal sentiment classification. IJCAI

  17. Khan Z, Fu Y (2021) Exploiting bert for multimodal target sentiment classification through input space translation. In: Proceedings of the 29th ACM international conference on multimedia, pp 3034–3042

  18. Ling Y, Yu J, Xia R (2022) Vision-language pre-training for multimodal aspect-based sentiment analysis. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, pp 2149–2159. Association for Computational Linguistics. https://aclanthology.org/2022.acl-long.152

  19. Yang H, Zhao Y, Qin B (2022) Face-sensitive image-to-emotional-text cross-modal translation for multimodal aspect-based sentiment analysis. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp 3324–3335. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates. https://aclanthology.org/2022.emnlp-main.219

  20. Xiao L, Wu X, Yang S, Xu J, Zhou J, He L (2023) Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis. Inf Process Manage 60(6):103508

    Article  Google Scholar 

  21. Borth D, Ji R, Chen T, Breuel T, Chang S-F (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232

  22. Wang M, Cao D, Li L, Li S, Ji R (2014) Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of international conference on internet multimedia computing and service, pp 76–80

  23. Cao D, Ji R, Lin D, Li S (2016) A cross-media public sentiment analysis system for microblog. Multimed Syst 22(4):479–486

    Article  Google Scholar 

  24. You Q, Luo J, Jin H, Yang J (2016) Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  25. Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR conference on research & development in information retrieval, pp 929–932

  26. Xu N, Zeng Z, Mao W (2020) Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3777–3786

  27. Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 606–615

  28. Xue W, Li T (2018) Aspect based sentiment analysis with gated convolutional networks. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2514–2523

  29. Hu M, Zhao S, Zhang L, Cai K, Su Z, Cheng R, Shen X (2019) Can: Constrained attention networks for multi-aspect sentiment analysis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4601–4610

  30. Zhu P, Chen Z, Zheng H, Qian T (2019) Aspect aware learning for aspect category sentiment analysis. ACM Trans Knowl Discov Data (TKDD) 13(6):1–21

    Article  Google Scholar 

  31. Li Y, Yin C, Zhong S-h (2020) Sentence constituent-aware aspect-category sentiment analysis with graph attention networks. In: CCF international conference on natural language processing and chinese computing, pp 815–827. Springer

  32. Yu J, Jiang J, Xia R (2019) Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 28:429–439

    Article  ADS  Google Scholar 

  33. Ju X, Zhang D, Xiao R, Li J, Li S, Zhang M, Zhou G (2021) Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 4395–4405

  34. Zhao F, Wu Z, Long S, Dai X, Huang S, Chen J (2022) Learning from adjective-noun pairs: A knowledge-enhanced framework for target-oriented multimodal sentiment classification. In: Proceedings of the 29th international conference on computational linguistics, pp 6784–6794. International Committee on Computational Linguistics, Gyeongju, Republic of Korea. https://aclanthology.org/2022.coling-1.590

  35. Yang L, Na J-C, Yu J (2022) Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis. Inf Process Manage 59(5):103038. https://doi.org/10.1016/j.ipm.2022.103038

    Article  Google Scholar 

  36. Cauteruccio F, Terracina G (2023) Extended high-utility pattern mining: An answer set programming-based framework and applications. Theory and Practice of Logic Programming, pp 1–31

  37. Wang D, Tian C, Liang X, Zhao L, He L, Wang Q (2023) Dual-perspective fusion network for aspect-based multimodal sentiment analysis. IEEE Transactions on Multimedia

  38. Kirange D, Deshmukh RR, Kirange M (2014) Aspect based sentiment analysis semeval-2014 task 4. Asian Journal of Computer Science and Information Technology (AJCSIT) Vol 4

  39. Bu J, Ren L, Zheng S, Yang Y, Wang J, Zhang F, Wu W (2021) Asap: A chinese review dataset towards aspect category sentiment analysis and rating prediction. In: Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 2069–2079

  40. Wu Y, Kirillov A, Massa F, Lo W-Y, Girshick R (2019) Detectron2. https://github.com/facebookresearch/detectron2

  41. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  42. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. In: International conference on learning representations

  43. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  44. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  45. Zhang C, Li Q, Song D (2019) Aspect-based sentiment classification with aspect-specific graph convolutional networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4560–4570. Association for Computational Linguistics, Hong Kong, China. https://doi.org/10.18653/v1/D19-1464. https://www.aclweb.org/anthology/D19-1464

  46. Zhang Y, Qi P, Manning CD (2018) Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2205–2215

  47. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, pp 8748–8763

  48. Qi D, Su L, Song J, Cui E, Bharti T, Sacheti A (2020) Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data. arXiv:2001.07966

  49. Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J (2019) Vl-bert: Pre-training of generic visual-linguistic representations. In: International conference on learning representations

  50. Tan H, Bansal M (2019) Lxmert: Learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5100–5111

  51. He R, Lee WS, Ng HT, Dahlmeier D (2018) Effective attention modeling for aspect-level sentiment classification. In: Proceedings of the 27th international conference on computational linguistics, pp 1121–1131

  52. Kenton JDM-WC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, pp 2

  53. Yang L, Na J-C, Yu J (2022) Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis. Inf Process Manage 59(5):103038

    Article  Google Scholar 

  54. Li S, Zhao Z, Hu R, Li W, Liu T, Du X (2018) Analogical reasoning on chinese morphological and semantic relations. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 2: Short Papers), pp 138–143

  55. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations (ICLR 2015). Computational and Biological Learning Society

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceived and designed the experiments: Hao Yang. Performed the experiments: Hao Yang. Analyzed the data: Hao Yang. Wrote and reviewed the paper: Hao Yang. Approved the final version of the paper: Hao Yang, Zhengming Si, Yanyan Zhao, Jianwei Liu, Yang Wu, Bing Qin.

Corresponding author

Correspondence to Hao Yang.

Ethics declarations

Conflicts of interest

We declare that we have no conflict of interest.

Competing of interest

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Si, Z., Zhao, Y. et al. MACSA: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18796-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18796-7

Keywords

Navigation