Abstract
The ability to analyse online user-generated content related to sentiments (e.g., thoughts and opinions) on products or policies has become a de-facto skillset for many companies and organisations. Besides the challenge of understanding formal textual content, it is also necessary to take into consideration the informal and mixed linguistic nature of online social media languages, which are often coupled with localised slang as a way to express ‘true’ feelings. Due to the multilingual nature of social media data, analysis based on a single official language may carry the risk of not capturing the overall sentiment of online content. While efforts have been made to understand multilingual sentiment analysis based on a range of informal languages, no significant electronic resource has been built for these localised languages. This paper reviews the various current approaches and tools used for multilingual sentiment analysis, identifies challenges along this line of research, and provides several recommendations including a framework that is particularly applicable for dealing with scarce resource languages.
Similar content being viewed by others
References
Abdul-Mageed M, Diab MT, Korayem M (2011) Subjectivity and sentiment analysis of modern standard arabic. Proc Ann Meet Assoc Comput Ling Human Language Technol Short Papers 2:587–591
Ahmad K, Cheng D, Almas Y (2006) Multi-lingual sentiment analysis of financial news streams. In: Proceedings of the international conference on grid in finance
Ambati V, Vogel S, Carbonell JG (2010) Active learning and crowd-sourcing for machine translation. In: Proceedings of language resources and evaluation conference, vol. 1, p 2
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 2200–2204
Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: a lexical resource for Hindi polarity classification. In: Proceedings of language resources and evaluation conference, pp 1189–1196
Balahur A, Turchi M (2013) Improving sentiment analysis in Twitter using multilingual machine translated data. In: Proceedings of recent advances in natural language processing, pp 49–55
Balahur A, Turchi M (2014) Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput Speech Lang 28(1):56–75
Banea C, Mihalcea R, Wiebe J (2008) A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In: Proceedings of language resources and evaluation conference, vol. 8, pp 2–764
Barbosa L, Feng J (2010) Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 36–44
Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: Proceedings of international conference on web and social media
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the conference on empirical methods in natural language processing, pp 120–128
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. Proc Ann Meet Assoc Comput Ling 7:440–447
Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr 12(5):526–558
Boudin F, Huet S, Torres-Moreno J-M, Torres-Moreno J (2010) A graph-based approach to cross-language multi-document summarization. Res J Comput Sci Comput Eng Appl Polibits 43:113–118
Boyd-Graber J, Resnik P (2010) Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation. In: Proceedings of the conference on empirical methods in natural language processing, pp 45–55
Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 1515–1521
Cambria E, Hussain A (2015) Sentic computing: a common-sense-based framework for concept-level sentiment analysis, vol 1. Springer, Berlin
Cambria E, Gastaldo P, Bisio F, Zunino R (2015a) An ELM-based model for affective analogical reasoning. Neurocomputing 149:443–455
Cambria E, Fu J, Bisio F, Poria S (2015b) AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 508–514
Cambria E, Rajagopal D, Kwok K, Sepulveda J (2015c) GECKA: game engine for commonsense knowledge acquisition. In: Proceedings of AAAI FLAIRS conference, pp 282–287
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107
Che W, Li Z, Liu T (2010) Ltp: a chinese language technology platform. In: Proceedings of the international conference on computational linguistics: demonstrations, pp 13–16
Chowdhury S, Chowdhury W (2014) Performing sentiment analysis in Bangla microblog posts. In: Proceedings of international conference on informatics, electronics and vision, pp 1–6
Constant N, Davis C, Potts C, Schwarz F (2009) The pragmatics of expressive content: evidence from large corpora. Sprache Datenverarb 33(1–2):5–21
Cui A, Zhang M, Liu Y, Ma S (2011) ‘Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Information retrieval technology, Springer, Berlin, pp 238–249
Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 241–249
Denecke K (2008) Using sentiwordnet for multilingual sentiment analysis. In: Proceedings of international conference on data engineering workshops, pp 507–512
Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 8599–8603
Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the special interest group on computer–human interaction conference, pp 281–285
Elming J, Hovy D, Plank B (2014) Robust cross-domain sentiment analysis for low-resource languages. In: Proceedings of annual meeting of association for computational linguistics, pp 2–7
Esuli A, Sebastiani F (2006) Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the conference of the European chapter of the association for computational linguistics, vol. 6, p 2006
Ghani R, Jones R, Mladenić D (2001) Mining the web to create minority language corpora. In: Proceedings of the international conference on information and knowledge management, pp 279–286
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Proj Rep Stanf 1–12
Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs. In: Proceedings of international conference on web and social media, vol. 7, p 21
Hiroshi K, Tetsuya N, Hideo W (2004) Deeper sentiment analysis using machine translation technology. In: Proceedings of the international conference on computational linguistics, p 494
Hu Y, Duan J, Chen X, Pei B, Lu R (2005) A new method for sentiment classification in text retrieval. In: Proceedings of international joint conference on natural language processing, pp 1–9
IBM—WebSphere translation server for multiplatforms. http://www-03.ibm.com/software/products/en/translation-server. Accessed 28 Mar 2015
Irvine A, Callison-Burch C (2013) Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the eighth workshop on statistical machine translation, pp 262–270
Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:151–160
Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 355–363
Kann V, Rosell M (2005) Free construction of a free Swedish dictionary of synonyms. In: Proceedings of the nordic conference on computational linguistics, pp 105–110
Kim S-M, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the conference of North American chapter of the association of computational linguistics, pp 200–207
Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2005) Collecting evaluative expressions for opinion extraction. In: Proceedings of international conference on natural language processing, pp 596–605
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. Proc Mach Trans Summit 5:79–86
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting on association for computational linguistics?: demonstrations, pp 177–180
Kouloumpis E, Wilson T, Moore JD (2011) Twitter sentiment analysis: the good the bad and the omg!. Proc Int Conf Web Soc Media 11:538–541
Leimgruber JR (2011) Singapore English. Lang Linguist Compass 5(1):47–62
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of European conference on machine learning, pp 4–15
LingPipe Home. http://alias-i.com/lingpipe/index.html. Accessed 25 Mar 2015
Lo SL, Cambria E, Chiong R, Cornforth D (2016a) A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowl Based Syst 105:236–247
Lo SL, Chiong R, Cornforth D, Bao Y (2016b) Topic detection in twitter via multilingual analysis. In: Applied informatics and technology innovation. Springer, Switzerland, pp 1–22
Lu B, Tan C, Cardie C, Tsou BK (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:320–330
Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. Proc Ann Meet Assoc Comput Ling Long Papers 1:572–581
Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: Proceedings of annual meeting of association for computational linguistics, vol. 45, p 976
Miller GA (1990) Nouns in WordNet: a lexical inheritance system. Int J Lexicogr 3(4):245–264
Miller GA (1995) WordNet: a lexical database for English. Commun. ACM 38(11):39–41
Miller GA, Leacock C, Tengi R, Bunker RT (1993) A semantic concordance. In: Proceedings of the workshop on human language technology, pp 303–308
Monson C, Llitjós AF, Aranovich R, Levin L, Brown R, Peterson E, Carbonell J, Lavie A (2006) Building NLP systems for two resource-scarce indigenous languages: mapudungun and Quechua. Strateg Dev Mach Transl Minor Lang, p 15
Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4):477–504
Nakov P, Kozareva Z, Ritter A, Rosenthal S, Stoyanov V, Wilson T (2013) Semeval-2013 task 2: sentiment analysis in twitter. In: Proceedings of the international workshop on semantic evaluation
NTCIR8 MOAT Xinhua and NYT News corpus. http://research.nii.ac.jp/ntcir/ntcir-ws8/permission/ntcir8xinhua-nyt-moat.html. Accessed 27 Mar 2015
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the annual meeting on association for computational linguistics, pp 440–447
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 1320–1326
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pan J, Xue G-R, Yu Y, Wang Y (2011) Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization. In: Advances in knowledge discovery and data mining, Springer, Berlin, pp 289–300
Poria S, Cambria E, Winterstein G, Huang G-B (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63
Poria S, Cambria E, Gelbukh A, Bisio F, Hussain A (2015) Sentiment data flow analysis by means of dynamic linguistic patterns. Comput Intell Mag IEEE 10(4):26–36
Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A (2011) The subspace Gaussian mixture model-A structured model for speech recognition. Comput Speech Lang 25(2):404–439
Prettenhofer P, Stein B (2011) Cross-lingual adaptation using structural correspondence learning. ACM Trans Intell Syst Technol 3(1):13
Qian Y, Povey D, Liu J (2011) State-level data borrowing for low-resource speech recognition based on subspace GMMs. In: Proceedings of annual conference of the international speech communication association, pp 553–560
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam
Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the association for computational linguistics student research workshop, pp 43–48
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the conference on empirical methods in natural language processing, pp 105–112
Rosell M, Kann V (2010) Constructing a swedish general purpose polarity lexicon random walks in the people’s dictionary of synonyms. In: Proceedings of Swedish language technology conference, pp 19–20
Savoy J, Dolamic L (2009) How effective is Google’s translation service in search? Commun ACM 52(10):139–143
Seki Y, Evans DK, Ku L-W, Chen H-H, Kando N, Lin C-Y (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of NTCIR-6 workshop meeting, pp 265–278
Seki Y, Evans DK, Ku L-W, Sun L, Chen H-H, Kando N, Lin C-Y (2008) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of NTCIR-7 workshop meeting
Silva MJ, Carvalho P, Costa C, Sarmento L (2010) Automatic expansion of a social judgment lexicon for sentiment analysis. Technical Report TR 1008 University of Lisbon Faculty of Sciences LASIGE
Souza M, Vieira R (2012) Sentiment analysis on twitter data for portuguese language. In: Computational processing of the Portuguese language, Springer, Berlin, pp 241–247
Souza M, Vieira R, Busetti D, Chishman R, Alves IM (2011) Construction of a portuguese opinion lexicon from multiple resources. In: Proceedings of the Brazilian symposium in information and human language technology, pp 59–66
Su Q, Xiang K, Wang H, Sun B, Yu S (2006) Using pointwise mutual information to identify implicit features in customer reviews. In: Computer processing of oriental languages. Beyond the Orient, The Research Challenges Ahead, Springer, Berlin, pp 22–30
Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629
Thomas S, Seltzer ML, Church K, Hermansky H (2013) Deep neural network features and semi-supervised training for low resource speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 6704–6708
Turney PD (2001) Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Lect. Notes Comput. Sci. 491–502,
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of annual meeting of the association of computational linguistics, pp 417–424
Vapnik V (2000) The nature of statistical learning theory. Springer, Berlin
Volkova S, Wilson T, Yarowsky D (2013) Exploring sentiment in social media: bootstrapping subjectivity clues from multilingual twitter streams. In: Proceedings of annual meeting of the association of computational linguistics, pp 505–510
Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 553–561
Wan X (2009) ‘Co-training for cross-lingual sentiment classification’, In: Proceedings of the joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing, pp 235–243
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2–3):165–210
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005a) OpinionFinder: a system for subjectivity analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 34–35
Wilson T, Wiebe J, Hoffmann P (2005b) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 347–354
Xia Y, Li X, Cambria E, Hussain A (2014) A localization toolkit for SenticNet. In: Proceedings of IEEE international conference on data mining workshops, pp 403–408
Xu R, Wong K-F, Xia Y (2007) Opinmine—opinion analysis system by CUHK for NTCIR-6 pilot task. In: Proceedings of the NTCIR-6 workshop
Yao J, Wu G, Liu J, Zheng Y (2006) Using bilingual lexicon to judge sentiment orientation of Chinese words. In: Proceedings of IEEE international conference on computer and information technology, pp 38–38
Zhang W, Johnson TJ, Seltzer T, Bichard SL (2009) The revolution will be networked: the influence of social networking sites on political attitudes and behavior. Soc Sci Comput Rev 28(1):75–92
Zhao J, Dong L, Wu J, Xu K (2012) Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1528–1531
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lo, S.L., Cambria, E., Chiong, R. et al. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48, 499–527 (2017). https://doi.org/10.1007/s10462-016-9508-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9508-4