skip to main content
research-article

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

Published:29 May 2018Publication History
Skip Editorial Notes Section

Editorial Notes

A corrigendum was issued for this article on November 29, 2018. You can download the corrigendum from the source materials section of this citation page.

Skip Abstract Section

Abstract

Lexicon-based sentiment analysis (SA) aims to address the problem of extracting people’s opinions from their comments on the Web using a predefined lexicon of opinionated words. In contrast to the machine learning (ML) approach, lexicon-based methods are domain-independent methods that do not need a large annotated training corpus and hence are faster. This makes the lexicon-based approach prevalent in the SA community. However, the story is different for the Persian language. In contrast to English, using the lexicon-based method in Persian is a new discipline. There are rather limited resources available for SA in Persian, making the accuracy of the existing lexicon-based methods lower than other languages. In the current study, first an exhaustive investigation of the lexicon-based method is performed. Then two new resources are introduced to address the problem of resource scarcity for SA in Persian: a carefully labeled lexicon of sentiment words, PerLex, and a new handmade dataset of about 16,000 rated documents, PerView. Moreover, a new hybrid method using both ML and the lexicon-based approach is presented in which PerLex words are used to train the ML algorithm. Experiments are carried out on our new PerView dataset. Results indicate that the accuracy of PerLex is higher than the existing CNRC, Adjectives, SentiStrength, PerSent, and LexiPers lexicons. In addition, the results show that using PerLex significantly decreases the execution time of the proposed system in comparison to the above-mentioned lexicons. Moreover, the results demonstrate the excellence of using opinionated lexicon terms followed by bigrams as the features employed in the ML method.

Skip Supplemental Material Section

Supplemental Material

References

  1. Digikala. 2017. Home Page. Retrieved March 23, 2018, from http://www.digikala.com.Google ScholarGoogle Scholar
  2. Saeedeh Alimardani and Abdollah Aghaei. 2015. Opinion mining in Persian language using supervised algorithms. Journal of Information Systems and Telecommunication 3, 3, 1--7.Google ScholarGoogle Scholar
  3. Fatemeh Amiri, Simon Scerri, Mohammad H. Khodashahi, Fraunhofer Iais, and Sankt Augustin. 2015. Lexicon-based sentiment analysis for Persian text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 9--16.Google ScholarGoogle Scholar
  4. Ehsan Asgarian, Reza Saeedi, Ahmad Stiri, Behdad Bahmadi, and Hadi Ghaemi. {n. d.}. NLPTools. Available at https://wtlab.um.ac.ir.Google ScholarGoogle Scholar
  5. Ayoub Bagheri, Mohamad Saraee, and Franciska de Jong. 2013. Sentiment classification in Persian: Introducing a mutual information-based method for feature selection. In Proceedings of the 2013 21st Iranian Conference on Electrical Engineering (ICEE’13). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ayoub Bagheri and Mohamad Saraee.2014. Persian sentiment analyzer: A framework based on a novel feature selection method. International Journal of Artificial Intelligence 12, 2, 115., http://www.scopus.com/inward/record.url?eid=2-s2.0-84926213301&partnerID===408md5=69f8a916da14f0362bc2cbded411a2f3 (2014), 115--129Google ScholarGoogle Scholar
  7. Mohammad Basiri, Ahmad Nilchi, and Nasser Ghassem-Aghaee. 2014. A framework for sentiment analysis in persian. Open Transactions on Information Processing 1, 3, 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mohammad Ehsan Basiri, Nasser Ghasem-Aghaee, and Ahmad-Mohamad SaraeeReza Naghsh-Nilchi. 2014. Exploiting reviewers’ comment histories for sentiment analysis. Journal of Information Science 40, 3, 313--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mohammad Ehsan Basiri and Arman Kabiri. 2017. Sentence-level sentiment analysis in Persian. In Proceedings of the 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA’17). IEEE, Los Alamitos, CA, 84--89.Google ScholarGoogle ScholarCross RefCross Ref
  10. Mohammad Ehsan Basiri, Ahmad Reza Naghsh-Nilchi, and Nasser Ghasem-Aghaee. 2014. Sentiment prediction based on Dempster-Shafer theory of evidence. Mathematical Problems in Engineering 2014, 1--13. http://www.hindawi.com/journals/mpe/2014/361201/abs/.Google ScholarGoogle ScholarCross RefCross Ref
  11. Farah Benamara, Sabatier Irit, Carmine Cesarano, Napoli Federico, and Diego Reforgiato. 2007. Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the International Conference on Weblogs and Social Media. 1--4. DOI:https://doi.org/citeulike-article-id:9387439Google ScholarGoogle Scholar
  12. Erik Cambria, Bjorn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28, 2, 15--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kia Dashtipour, Amir Hussain, Qiang Zhou, Alexander Gelbukh, Ahmad YAHawalah, and Erik Cambria. 2016. PerSent: A freely available persian sentiment lexicon. In Advances in Brain Inspired Cognitive Systems: 8th International Conference (BICS’16). Springer, 310--320.Google ScholarGoogle ScholarCross RefCross Ref
  14. Andrea Ceron, Luigi Curini, and Stefano M. Iacus. 2015. Using sentiment analysis to monitor electoral campaigns: Method matters—evidence from the United Sates and Italy. Social Science Computer Review 33, 1, 3--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andrea Ceron, Luigi Curini, Stefano M. Iacus, and Giuseppe Porro. 2014. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media and Society 16, 2, 340--358.Google ScholarGoogle ScholarCross RefCross Ref
  16. Effat Golpar-Rabooki, Saghi-Al-Sadat Zarghamifar, and Jalal Rezaeenour. 2015. Feature extraction in opinion mining through Persian reviews. Journal of Artificial Intelligence and Data Mining 3, 2, 169--179.Google ScholarGoogle Scholar
  17. Mohammad Sadegh Hajmohammadi and Roliana Ibrahim. 2013. A SVM-based method for sentiment analysis in Persian language. In Proceedings of SPIE 8768: International Conference on Graphic and Image Processing (ICGIP’12). 876838.Google ScholarGoogle ScholarCross RefCross Ref
  18. Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1, 1--167. Google ScholarGoogle ScholarCross RefCross Ref
  19. Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions.Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  20. Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4, 1093--1113.Google ScholarGoogle ScholarCross RefCross Ref
  21. Shahla Nemati and Ahmad Reza Naghsh-Nilchi. 2016. Incorporating social media comments in affective video retrieval. Journal of Information Science 42, 4, 524--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1--2, 1--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bo Pang, Lillian Lee, Harry Rd, and San Jose. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mohamad Saraee and Ayoub Bagheri. 2013. Feature selection methods in Persian sentiment analysis. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 7934. Springer, 303--308.Google ScholarGoogle Scholar
  25. Kim Schouten and Flavius Frasincar. 2016. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 28, 3, 813--830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Glenn Shafer. 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.Google ScholarGoogle Scholar
  27. Mohammadreza Shams, Azadeh Shakery, and Heshaam Faili. 2012. A non-parametric LDA-based induction method for sentiment analysis. In Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 216--221.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 90--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Venkatramana S. Subrahmanian and Diego Reforgiato. 2008. AVA: Adjective-verb-adverb combinations for sentiment analysis. IEEE Intelligent Systems 23, 4, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2, 267--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology 63, 1, 163--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mike Thelwall, Kevan Buckley, George Paltoglou, Marcin Skowron, David Garcia, Stephane Gobron, Junghyun Ahn, Arvid Kappas, Dennis Küster, and Janusz A. Holyst. 2013. Damping sentiment analysis in online communication: Discussions, monologs and dialogs. In Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, Vol. 7817. Springer, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xiaohui Yu, Yang Liu, Xiangji Huang, and Aijun An. 2012. Mining online reviews for predicting sales performance: A case study in the movie domain. IEEE Transactions on Knowledge and Data Engineering 24, 4, 720--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wenhao Zhang, Hua Xu, and Wei Wan. 2012. Weakness finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems With Applications 39, 11, 10283--10291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mohammad SM, Kiritchenko S, Zhu X. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets. Atlanta, Georgia, USA. 2013 Jun 14:321.Google ScholarGoogle Scholar
  36. Sabeti B, Hosseini P, Ghassem-Sani G, Mirroshandel SA. LexiPers: An ontology-based sentiment lexicon for Persian. InGCAI 2016 (pp. 329-339).Google ScholarGoogle Scholar

Index Terms

  1. Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 4
          December 2018
          193 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3229525
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2018
          • Accepted: 1 March 2018
          • Revised: 1 January 2018
          • Received: 1 October 2017
          Published in tallip Volume 17, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader