Skip to main content
Log in

Multilingual sentiment analysis: from formal to informal and scarce resource languages

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The ability to analyse online user-generated content related to sentiments (e.g., thoughts and opinions) on products or policies has become a de-facto skillset for many companies and organisations. Besides the challenge of understanding formal textual content, it is also necessary to take into consideration the informal and mixed linguistic nature of online social media languages, which are often coupled with localised slang as a way to express ‘true’ feelings. Due to the multilingual nature of social media data, analysis based on a single official language may carry the risk of not capturing the overall sentiment of online content. While efforts have been made to understand multilingual sentiment analysis based on a range of informal languages, no significant electronic resource has been built for these localised languages. This paper reviews the various current approaches and tools used for multilingual sentiment analysis, identifies challenges along this line of research, and provides several recommendations including a framework that is particularly applicable for dealing with scarce resource languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://www.internetworldstats.com/stats7.htm.

  2. http://www.internetworldstats.com/stats.htm.

  3. http://alias-i.com/lingpipe/index.html.

  4. http://www.promt.com/.

  5. http://www.babelfish.com/.

  6. http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language.

References

  • Abdul-Mageed M, Diab MT, Korayem M (2011) Subjectivity and sentiment analysis of modern standard arabic. Proc Ann Meet Assoc Comput Ling Human Language Technol Short Papers 2:587–591

    Google Scholar 

  • Ahmad K, Cheng D, Almas Y (2006) Multi-lingual sentiment analysis of financial news streams. In: Proceedings of the international conference on grid in finance

  • Ambati V, Vogel S, Carbonell JG (2010) Active learning and crowd-sourcing for machine translation. In: Proceedings of language resources and evaluation conference, vol. 1, p 2

  • Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 2200–2204

  • Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: a lexical resource for Hindi polarity classification. In: Proceedings of language resources and evaluation conference, pp 1189–1196

  • Balahur A, Turchi M (2013) Improving sentiment analysis in Twitter using multilingual machine translated data. In: Proceedings of recent advances in natural language processing, pp 49–55

  • Balahur A, Turchi M (2014) Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput Speech Lang 28(1):56–75

    Article  Google Scholar 

  • Banea C, Mihalcea R, Wiebe J (2008) A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In: Proceedings of language resources and evaluation conference, vol. 8, pp 2–764

  • Barbosa L, Feng J (2010) Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 36–44

  • Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: Proceedings of international conference on web and social media

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the conference on empirical methods in natural language processing, pp 120–128

  • Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. Proc Ann Meet Assoc Comput Ling 7:440–447

    Google Scholar 

  • Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr 12(5):526–558

    Article  Google Scholar 

  • Boudin F, Huet S, Torres-Moreno J-M, Torres-Moreno J (2010) A graph-based approach to cross-language multi-document summarization. Res J Comput Sci Comput Eng Appl Polibits 43:113–118

    Google Scholar 

  • Boyd-Graber J, Resnik P (2010) Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation. In: Proceedings of the conference on empirical methods in natural language processing, pp 45–55

  • Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 1515–1521

  • Cambria E, Hussain A (2015) Sentic computing: a common-sense-based framework for concept-level sentiment analysis, vol 1. Springer, Berlin

    Book  Google Scholar 

  • Cambria E, Gastaldo P, Bisio F, Zunino R (2015a) An ELM-based model for affective analogical reasoning. Neurocomputing 149:443–455

    Article  Google Scholar 

  • Cambria E, Fu J, Bisio F, Poria S (2015b) AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: Proceedings of AAAI conference on artificial intelligence, pp 508–514

  • Cambria E, Rajagopal D, Kwok K, Sepulveda J (2015c) GECKA: game engine for commonsense knowledge acquisition. In: Proceedings of AAAI FLAIRS conference, pp 282–287

  • Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

    Article  Google Scholar 

  • Che W, Li Z, Liu T (2010) Ltp: a chinese language technology platform. In: Proceedings of the international conference on computational linguistics: demonstrations, pp 13–16

  • Chowdhury S, Chowdhury W (2014) Performing sentiment analysis in Bangla microblog posts. In: Proceedings of international conference on informatics, electronics and vision, pp 1–6

  • Constant N, Davis C, Potts C, Schwarz F (2009) The pragmatics of expressive content: evidence from large corpora. Sprache Datenverarb 33(1–2):5–21

    Google Scholar 

  • Cui A, Zhang M, Liu Y, Ma S (2011) ‘Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Information retrieval technology, Springer, Berlin, pp 238–249

  • Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 241–249

  • Denecke K (2008) Using sentiwordnet for multilingual sentiment analysis. In: Proceedings of international conference on data engineering workshops, pp 507–512

  • Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 8599–8603

  • Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the special interest group on computer–human interaction conference, pp 281–285

  • Elming J, Hovy D, Plank B (2014) Robust cross-domain sentiment analysis for low-resource languages. In: Proceedings of annual meeting of association for computational linguistics, pp 2–7

  • Esuli A, Sebastiani F (2006) Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the conference of the European chapter of the association for computational linguistics, vol. 6, p 2006

  • Ghani R, Jones R, Mladenić D (2001) Mining the web to create minority language corpora. In: Proceedings of the international conference on information and knowledge management, pp 279–286

  • Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Proj Rep Stanf 1–12

  • Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs. In: Proceedings of international conference on web and social media, vol. 7, p 21

  • Hiroshi K, Tetsuya N, Hideo W (2004) Deeper sentiment analysis using machine translation technology. In: Proceedings of the international conference on computational linguistics, p 494

  • Hu Y, Duan J, Chen X, Pei B, Lu R (2005) A new method for sentiment classification in text retrieval. In: Proceedings of international joint conference on natural language processing, pp 1–9

  • IBM—WebSphere translation server for multiplatforms. http://www-03.ibm.com/software/products/en/translation-server. Accessed 28 Mar 2015

  • Irvine A, Callison-Burch C (2013) Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the eighth workshop on statistical machine translation, pp 262–270

  • Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:151–160

    Google Scholar 

  • Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 355–363

  • Kann V, Rosell M (2005) Free construction of a free Swedish dictionary of synonyms. In: Proceedings of the nordic conference on computational linguistics, pp 105–110

  • Kim S-M, Hovy E (2006) Identifying and analyzing judgment opinions. In: Proceedings of the conference of North American chapter of the association of computational linguistics, pp 200–207

  • Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2005) Collecting evaluative expressions for opinion extraction. In: Proceedings of international conference on natural language processing, pp 596–605

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. Proc Mach Trans Summit 5:79–86

    Google Scholar 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting on association for computational linguistics?: demonstrations, pp 177–180

  • Kouloumpis E, Wilson T, Moore JD (2011) Twitter sentiment analysis: the good the bad and the omg!. Proc Int Conf Web Soc Media 11:538–541

    Google Scholar 

  • Leimgruber JR (2011) Singapore English. Lang Linguist Compass 5(1):47–62

    Article  Google Scholar 

  • Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of European conference on machine learning, pp 4–15

  • LingPipe Home. http://alias-i.com/lingpipe/index.html. Accessed 25 Mar 2015

  • Lo SL, Cambria E, Chiong R, Cornforth D (2016a) A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowl Based Syst 105:236–247

    Article  Google Scholar 

  • Lo SL, Chiong R, Cornforth D, Bao Y (2016b) Topic detection in twitter via multilingual analysis. In: Applied informatics and technology innovation. Springer, Switzerland, pp 1–22

  • Lu B, Tan C, Cardie C, Tsou BK (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. Proc Ann Meet Assoc Comput Ling Hum Lang Technol 1:320–330

    Google Scholar 

  • Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. Proc Ann Meet Assoc Comput Ling Long Papers 1:572–581

    Google Scholar 

  • Mihalcea R, Banea C, Wiebe J (2007) Learning multilingual subjective language via cross-lingual projections. In: Proceedings of annual meeting of association for computational linguistics, vol. 45, p 976

  • Miller GA (1990) Nouns in WordNet: a lexical inheritance system. Int J Lexicogr 3(4):245–264

    Article  Google Scholar 

  • Miller GA (1995) WordNet: a lexical database for English. Commun. ACM 38(11):39–41

    Article  Google Scholar 

  • Miller GA, Leacock C, Tengi R, Bunker RT (1993) A semantic concordance. In: Proceedings of the workshop on human language technology, pp 303–308

  • Monson C, Llitjós AF, Aranovich R, Levin L, Brown R, Peterson E, Carbonell J, Lavie A (2006) Building NLP systems for two resource-scarce indigenous languages: mapudungun and Quechua. Strateg Dev Mach Transl Minor Lang, p 15

  • Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4):477–504

    Article  Google Scholar 

  • Nakov P, Kozareva Z, Ritter A, Rosenthal S, Stoyanov V, Wilson T (2013) Semeval-2013 task 2: sentiment analysis in twitter. In: Proceedings of the international workshop on semantic evaluation

  • NTCIR8 MOAT Xinhua and NYT News corpus. http://research.nii.ac.jp/ntcir/ntcir-ws8/permission/ntcir8xinhua-nyt-moat.html. Accessed 27 Mar 2015

  • Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the annual meeting on association for computational linguistics, pp 440–447

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of language resources and evaluation conference, vol. 10, pp 1320–1326

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  • Pan J, Xue G-R, Yu Y, Wang Y (2011) Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization. In: Advances in knowledge discovery and data mining, Springer, Berlin, pp 289–300

  • Poria S, Cambria E, Winterstein G, Huang G-B (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63

    Article  Google Scholar 

  • Poria S, Cambria E, Gelbukh A, Bisio F, Hussain A (2015) Sentiment data flow analysis by means of dynamic linguistic patterns. Comput Intell Mag IEEE 10(4):26–36

    Article  Google Scholar 

  • Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A (2011) The subspace Gaussian mixture model-A structured model for speech recognition. Comput Speech Lang 25(2):404–439

    Article  Google Scholar 

  • Prettenhofer P, Stein B (2011) Cross-lingual adaptation using structural correspondence learning. ACM Trans Intell Syst Technol 3(1):13

    Article  Google Scholar 

  • Qian Y, Povey D, Liu J (2011) State-level data borrowing for low-resource speech recognition based on subspace GMMs. In: Proceedings of annual conference of the international speech communication association, pp 553–560

  • Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  • Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the association for computational linguistics student research workshop, pp 43–48

  • Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the conference on empirical methods in natural language processing, pp 105–112

  • Rosell M, Kann V (2010) Constructing a swedish general purpose polarity lexicon random walks in the people’s dictionary of synonyms. In: Proceedings of Swedish language technology conference, pp 19–20

  • Savoy J, Dolamic L (2009) How effective is Google’s translation service in search? Commun ACM 52(10):139–143

    Article  Google Scholar 

  • Seki Y, Evans DK, Ku L-W, Chen H-H, Kando N, Lin C-Y (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of NTCIR-6 workshop meeting, pp 265–278

  • Seki Y, Evans DK, Ku L-W, Sun L, Chen H-H, Kando N, Lin C-Y (2008) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of NTCIR-7 workshop meeting

  • Silva MJ, Carvalho P, Costa C, Sarmento L (2010) Automatic expansion of a social judgment lexicon for sentiment analysis. Technical Report TR 1008 University of Lisbon Faculty of Sciences LASIGE

  • Souza M, Vieira R (2012) Sentiment analysis on twitter data for portuguese language. In: Computational processing of the Portuguese language, Springer, Berlin, pp 241–247

  • Souza M, Vieira R, Busetti D, Chishman R, Alves IM (2011) Construction of a portuguese opinion lexicon from multiple resources. In: Proceedings of the Brazilian symposium in information and human language technology, pp 59–66

  • Su Q, Xiang K, Wang H, Sun B, Yu S (2006) Using pointwise mutual information to identify implicit features in customer reviews. In: Computer processing of oriental languages. Beyond the Orient, The Research Challenges Ahead, Springer, Berlin, pp 22–30

  • Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629

    Article  Google Scholar 

  • Thomas S, Seltzer ML, Church K, Hermansky H (2013) Deep neural network features and semi-supervised training for low resource speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 6704–6708

  • Turney PD (2001) Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Lect. Notes Comput. Sci. 491–502,

  • Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of annual meeting of the association of computational linguistics, pp 417–424

  • Vapnik V (2000) The nature of statistical learning theory. Springer, Berlin

    Book  MATH  Google Scholar 

  • Volkova S, Wilson T, Yarowsky D (2013) Exploring sentiment in social media: bootstrapping subjectivity clues from multilingual twitter streams. In: Proceedings of annual meeting of the association of computational linguistics, pp 505–510

  • Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, pp 553–561

  • Wan X (2009) ‘Co-training for cross-lingual sentiment classification’, In: Proceedings of the joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing, pp 235–243

  • Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2–3):165–210

    Article  Google Scholar 

  • Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005a) OpinionFinder: a system for subjectivity analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 34–35

  • Wilson T, Wiebe J, Hoffmann P (2005b) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of conference on empirical methods in natural language processing, pp 347–354

  • Xia Y, Li X, Cambria E, Hussain A (2014) A localization toolkit for SenticNet. In: Proceedings of IEEE international conference on data mining workshops, pp 403–408

  • Xu R, Wong K-F, Xia Y (2007) Opinmine—opinion analysis system by CUHK for NTCIR-6 pilot task. In: Proceedings of the NTCIR-6 workshop

  • Yao J, Wu G, Liu J, Zheng Y (2006) Using bilingual lexicon to judge sentiment orientation of Chinese words. In: Proceedings of IEEE international conference on computer and information technology, pp 38–38

  • Zhang W, Johnson TJ, Seltzer T, Bichard SL (2009) The revolution will be networked: the influence of social networking sites on political attitudes and behavior. Soc Sci Comput Rev 28(1):75–92

    Article  Google Scholar 

  • Zhao J, Dong L, Wu J, Xu K (2012) Moodlens: an emoticon-based sentiment analysis system for chinese tweets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1528–1531

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Cambria.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lo, S.L., Cambria, E., Chiong, R. et al. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48, 499–527 (2017). https://doi.org/10.1007/s10462-016-9508-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9508-4

Keywords

Navigation