Abstract
Multimodal sentiment analysis has attracted increasing attention with broad application prospects. Most of the existing methods have focused on a single modality, which fails to handle social media data due to its multiple modalities. Moreover, in multimodal learning, most of the works have focused on simply combining the two modalities without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-level Attentive network (DMLANet), which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify Convolutional neural network representation power. Then, we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to fetch the sentiment-rich multimodal features for the classification automatically. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verify our method's superiority.
- [1] . 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In 33rd Conference on Neural Information Processing Systems 32 (2019), 1–11.Google Scholar
- [2] . 2021. VATT: Transformers for multimodal self-supervised learning from raw video, audio and text. In 35th Conference on Neural Information Processing Systems.Google Scholar
- [3] . 2021. Learning transferable visual models from natural language supervision. In 38th International Conference on Machine Learning.Google Scholar
- [4] . 2020. A deep learning architecture of RADLNet for visual sentiment analysis. Multim. Syst. 26 (2020), 431–451.Google ScholarCross Ref
- [5] . 2015. Sentiment analysis in medical settings: New opportunities and challenges. Artif. Intell. Med. 64 (2015), 17–27.Google ScholarDigital Library
- [6] . 2020. Sentimental analysis of Twitter data with respect to general elections in India. Procedia Comput. Sci. 173 (2020), 325–334.Google ScholarCross Ref
- [7] . 2020. A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96 (2020).Google ScholarDigital Library
- [8] . 2017. Sentiment analysis in financial texts. Decis. Supp. Syst. 94 (2017), 53–64.Google ScholarDigital Library
- [9] . 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In IEEE 16th International Conference on Data Mining.Google ScholarCross Ref
- [10] . 2019. Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl.-based Syst. 178 (2019), 61–73.Google ScholarDigital Library
- [11] . 2019. Image–text sentiment analysis via deep multimodal attentive fusion. Knowl.-based Syst. 167 (2019), 26–37.Google ScholarDigital Library
- [12] . 2018. Cross-modality microblog sentiment prediction via bi-layer multimodal hypergraph learning. IEEE Trans. Multim. 21, 4 (2018), 1062–1075.Google ScholarCross Ref
- [13] . 2016. A multimodal feature learning approach for sentiment analysis of social network multimedia. Multim. Tools Applic. 75, 5 (2016), 2507–2525.Google ScholarDigital Library
- [14] . 2015. Word-of-mouth understanding: Entity-centric multimodal aspect-opinion mining in social media. IEEE Trans. Multim. 17, 12 (2015), 2281–2296.Google ScholarDigital Library
- [15] . 2018. Integrating visual and textual affective descriptors for sentiment analysis of social media posts. In IEEE Conference on Multimedia Information Processing and Retrieval.Google ScholarCross Ref
- [16] . 2019. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 53, 6 (2019), 4335–4385.Google ScholarDigital Library
- [17] . 2019. Deep learning–based multimedia analytics: A review. ACM Trans. Multim. Comput., Commun., Applic. 15, 1 (2019), 1–26.Google ScholarDigital Library
- [18] . 2017. Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In IEEE International Conference on Intelligence and Security Informatics (ISI).Google ScholarDigital Library
- [19] . 2017. Predicting microblog sentiments via weakly supervised multi-modal deep learning. IEEE Trans. Multim. 20, 4 (2017), 997–1007.Google ScholarDigital Library
- [20] . 2019. An image-text consistency driven multimodal sentiment analysis approach for social media. Inf. Process. Manag. 56, 6 (2019).Google ScholarDigital Library
- [21] . 2019. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28 (2019), 429–439.Google ScholarDigital Library
- [22] . 2020. Affective computing for large-scale heterogeneous multimedia data: A survey. ACM Trans. Multim. Comput., Commun. Applic. 15, 35 (2020), 1–32.Google ScholarDigital Library
- [23] . 2019. Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl. Soft Comput. 80 (2019), 387–399.Google ScholarDigital Library
- [24] . 2020. Learning visual relationship and context-aware attention for image captioning. Pattern Recog. 98 (2020).Google ScholarDigital Library
- [25] . 2018. CBAM: Convolutional block attention module. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
- [26] . 2016. Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- [27] . 2019. AttnSense: Multi-level attention mechanism for multimodal human activity recognition. In 28th International Joint Conference on Artificial Intelligence.Google ScholarCross Ref
- [28] . 2015. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28 (2015), 2017–2025.Google Scholar
- [29] . 2016. Learning deep features for discriminative localization. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- [29] . 2018. Sequence classification with human attention. In 22nd Conference on Computational Natural Language Learning.Google ScholarCross Ref
- [31] . 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.Google Scholar
- [32] . 2016. Sentiment analysis on multi-view social data. In International Conference on Multimedia Modeling.Google ScholarCross Ref
- [33] . 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In 21st ACM International Conference on Multimedia.Google ScholarDigital Library
- [34] .2016. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In 9th ACM International Conference on Web Search and Data Mining.Google ScholarDigital Library
- [35] . 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell. Syst. 31, 6 (2016), 82–88.Google ScholarDigital Library
- [36] . 2017. Cross-media learning for image sentiment analysis in the wild. In IEEE International Conference on Computer Vision Workshops.Google ScholarCross Ref
- [37] . 2020. Multimodal sentiment analysis based on multi-head attention mechanism. In 4th International Conference on Machine Learning and Soft Computing.Google ScholarDigital Library
- [38] . 2017. A residual merged neutral network for multimodal sentiment analysis. In IEEE 2nd International Conference on Big Data Analysis (ICBDA).Google ScholarCross Ref
- [39] . 2018. A co-memory network for multimodal sentiment analysis. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.Google ScholarDigital Library
- [40] . 2017. MultiSentiNet: A deep semantic network for multimodal sentiment analysis. In ACM Conference on Information and Knowledge Management.Google ScholarDigital Library
- [41] . 2020. Fusion-extraction network for multimodal sentiment analysis. In Pacific-Asia Conference on Knowledge Discovery and Data Mining.Google ScholarDigital Library
- [42] . 2020. Social image sentiment analysis by exploiting multimodal content and heterogeneous relations. IEEE Trans. Industr. Inform. 17, 4 (2020), 1–8.Google Scholar
- [43] . 2020. Attention-based modality-gated networks for image-text sentiment analysis. ACM Trans. Multim. Comput., Commun. Applic. 16, 3 (2020), 1–9.Google ScholarDigital Library
- [44] . 2020. Transfer correlation between textual content to images for sentiment analysis. IEEE Access 8 (2020), 35276–35289.Google ScholarCross Ref
- [45] . 2020. CM-BERT: Cross-Modal BERT for text-audio sentiment analysis. In 28th ACM International Conference on Multimedia.Google ScholarDigital Library
- [46] . 2020. Integrating multimodal information in large pretrained transformers. In Conference Association for Computational Linguistics.Google ScholarCross Ref
- [47] . 2016. Grad-CAM: Why did you say that? arXiv preprint arXiv:1611.07450, 2016.Google Scholar
Index Terms
- A Deep Multi-level Attentive Network for Multimodal Sentiment Analysis
Recommendations
Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis
Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly ...
Attention and Engagement Aware Multimodal Conversational Systems
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionDespite their ability to complete certain tasks, dialog systems still suffer from poor adaptation to users' engagement and attention. We observe human behaviors in different conversational settings to understand human communication dynamics and then ...
Attentive Intra-modality Fusion for Multimodal Sentiment Analysis
Chinese Lexical SemanticsAbstractThe growing trend of sharing opinion videos on social media platforms leads to more and more attention to multimodal sentiment analysis research. A number of approaches in multimodal sentiment analysis have been proposed and continual improved ...
Comments