Editorial Notes
A corrigendum was issued for this article on November 29, 2018. You can download the corrigendum from the source materials section of this citation page.
Abstract
Lexicon-based sentiment analysis (SA) aims to address the problem of extracting people’s opinions from their comments on the Web using a predefined lexicon of opinionated words. In contrast to the machine learning (ML) approach, lexicon-based methods are domain-independent methods that do not need a large annotated training corpus and hence are faster. This makes the lexicon-based approach prevalent in the SA community. However, the story is different for the Persian language. In contrast to English, using the lexicon-based method in Persian is a new discipline. There are rather limited resources available for SA in Persian, making the accuracy of the existing lexicon-based methods lower than other languages. In the current study, first an exhaustive investigation of the lexicon-based method is performed. Then two new resources are introduced to address the problem of resource scarcity for SA in Persian: a carefully labeled lexicon of sentiment words, PerLex, and a new handmade dataset of about 16,000 rated documents, PerView. Moreover, a new hybrid method using both ML and the lexicon-based approach is presented in which PerLex words are used to train the ML algorithm. Experiments are carried out on our new PerView dataset. Results indicate that the accuracy of PerLex is higher than the existing CNRC, Adjectives, SentiStrength, PerSent, and LexiPers lexicons. In addition, the results show that using PerLex significantly decreases the execution time of the proposed system in comparison to the above-mentioned lexicons. Moreover, the results demonstrate the excellence of using opinionated lexicon terms followed by bigrams as the features employed in the ML method.
Supplemental Material
Available for Download
Corrigendum to "Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining," by Basiri et al., ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) Volume 17, Issue 4, Article No. 26.
- Digikala. 2017. Home Page. Retrieved March 23, 2018, from http://www.digikala.com.Google Scholar
- Saeedeh Alimardani and Abdollah Aghaei. 2015. Opinion mining in Persian language using supervised algorithms. Journal of Information Systems and Telecommunication 3, 3, 1--7.Google Scholar
- Fatemeh Amiri, Simon Scerri, Mohammad H. Khodashahi, Fraunhofer Iais, and Sankt Augustin. 2015. Lexicon-based sentiment analysis for Persian text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 9--16.Google Scholar
- Ehsan Asgarian, Reza Saeedi, Ahmad Stiri, Behdad Bahmadi, and Hadi Ghaemi. {n. d.}. NLPTools. Available at https://wtlab.um.ac.ir.Google Scholar
- Ayoub Bagheri, Mohamad Saraee, and Franciska de Jong. 2013. Sentiment classification in Persian: Introducing a mutual information-based method for feature selection. In Proceedings of the 2013 21st Iranian Conference on Electrical Engineering (ICEE’13). IEEE, Los Alamitos, CA, 1--6.Google ScholarCross Ref
- Ayoub Bagheri and Mohamad Saraee.2014. Persian sentiment analyzer: A framework based on a novel feature selection method. International Journal of Artificial Intelligence 12, 2, 115., http://www.scopus.com/inward/record.url?eid=2-s2.0-84926213301&partnerID===408md5=69f8a916da14f0362bc2cbded411a2f3 (2014), 115--129Google Scholar
- Mohammad Basiri, Ahmad Nilchi, and Nasser Ghassem-Aghaee. 2014. A framework for sentiment analysis in persian. Open Transactions on Information Processing 1, 3, 1--14.Google ScholarCross Ref
- Mohammad Ehsan Basiri, Nasser Ghasem-Aghaee, and Ahmad-Mohamad SaraeeReza Naghsh-Nilchi. 2014. Exploiting reviewers’ comment histories for sentiment analysis. Journal of Information Science 40, 3, 313--328. Google ScholarDigital Library
- Mohammad Ehsan Basiri and Arman Kabiri. 2017. Sentence-level sentiment analysis in Persian. In Proceedings of the 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA’17). IEEE, Los Alamitos, CA, 84--89.Google ScholarCross Ref
- Mohammad Ehsan Basiri, Ahmad Reza Naghsh-Nilchi, and Nasser Ghasem-Aghaee. 2014. Sentiment prediction based on Dempster-Shafer theory of evidence. Mathematical Problems in Engineering 2014, 1--13. http://www.hindawi.com/journals/mpe/2014/361201/abs/.Google ScholarCross Ref
- Farah Benamara, Sabatier Irit, Carmine Cesarano, Napoli Federico, and Diego Reforgiato. 2007. Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the International Conference on Weblogs and Social Media. 1--4. DOI:https://doi.org/citeulike-article-id:9387439Google Scholar
- Erik Cambria, Bjorn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28, 2, 15--21. Google ScholarDigital Library
- Kia Dashtipour, Amir Hussain, Qiang Zhou, Alexander Gelbukh, Ahmad YAHawalah, and Erik Cambria. 2016. PerSent: A freely available persian sentiment lexicon. In Advances in Brain Inspired Cognitive Systems: 8th International Conference (BICS’16). Springer, 310--320.Google ScholarCross Ref
- Andrea Ceron, Luigi Curini, and Stefano M. Iacus. 2015. Using sentiment analysis to monitor electoral campaigns: Method matters—evidence from the United Sates and Italy. Social Science Computer Review 33, 1, 3--20. Google ScholarDigital Library
- Andrea Ceron, Luigi Curini, Stefano M. Iacus, and Giuseppe Porro. 2014. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media and Society 16, 2, 340--358.Google ScholarCross Ref
- Effat Golpar-Rabooki, Saghi-Al-Sadat Zarghamifar, and Jalal Rezaeenour. 2015. Feature extraction in opinion mining through Persian reviews. Journal of Artificial Intelligence and Data Mining 3, 2, 169--179.Google Scholar
- Mohammad Sadegh Hajmohammadi and Roliana Ibrahim. 2013. A SVM-based method for sentiment analysis in Persian language. In Proceedings of SPIE 8768: International Conference on Graphic and Image Processing (ICGIP’12). 876838.Google ScholarCross Ref
- Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1, 1--167. Google ScholarCross Ref
- Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions.Cambridge University Press.Google ScholarCross Ref
- Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4, 1093--1113.Google ScholarCross Ref
- Shahla Nemati and Ahmad Reza Naghsh-Nilchi. 2016. Incorporating social media comments in affective video retrieval. Journal of Information Science 42, 4, 524--538. Google ScholarDigital Library
- Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1--2, 1--135. Google ScholarDigital Library
- Bo Pang, Lillian Lee, Harry Rd, and San Jose. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 79--86. Google ScholarDigital Library
- Mohamad Saraee and Ayoub Bagheri. 2013. Feature selection methods in Persian sentiment analysis. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 7934. Springer, 303--308.Google Scholar
- Kim Schouten and Flavius Frasincar. 2016. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 28, 3, 813--830. Google ScholarDigital Library
- Glenn Shafer. 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.Google Scholar
- Mohammadreza Shams, Azadeh Shakery, and Heshaam Faili. 2012. A non-parametric LDA-based induction method for sentiment analysis. In Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 216--221.Google ScholarCross Ref
- Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 90--94. Google ScholarDigital Library
- Venkatramana S. Subrahmanian and Diego Reforgiato. 2008. AVA: Adjective-verb-adverb combinations for sentiment analysis. IEEE Intelligent Systems 23, 4, 43--50. Google ScholarDigital Library
- Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2, 267--307. Google ScholarDigital Library
- Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology 63, 1, 163--173. Google ScholarDigital Library
- Mike Thelwall, Kevan Buckley, George Paltoglou, Marcin Skowron, David Garcia, Stephane Gobron, Junghyun Ahn, Arvid Kappas, Dennis Küster, and Janusz A. Holyst. 2013. Damping sentiment analysis in online communication: Discussions, monologs and dialogs. In Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, Vol. 7817. Springer, 1--12. Google ScholarDigital Library
- Xiaohui Yu, Yang Liu, Xiangji Huang, and Aijun An. 2012. Mining online reviews for predicting sales performance: A case study in the movie domain. IEEE Transactions on Knowledge and Data Engineering 24, 4, 720--734. Google ScholarDigital Library
- Wenhao Zhang, Hua Xu, and Wei Wan. 2012. Weakness finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems With Applications 39, 11, 10283--10291. Google ScholarDigital Library
- Mohammad SM, Kiritchenko S, Zhu X. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets. Atlanta, Georgia, USA. 2013 Jun 14:321.Google Scholar
- Sabeti B, Hosseini P, Ghassem-Sani G, Mirroshandel SA. LexiPers: An ontology-based sentiment lexicon for Persian. InGCAI 2016 (pp. 329-339).Google Scholar
Index Terms
- Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining
Recommendations
Mining slang and urban opinion words and phrases from cQA services: an optimization approach
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningCurrent opinion lexicons contain most of the common opinion words, but they miss slang and so-called urban opinion words and phrases (e.g. delish, cozy, yummy, nerdy, and yuck). These subjectivity clues are frequently used in community questions and are ...
Generate domain-specific sentiment lexicon for review sentiment analysis
Lexicon-based approaches for review sentiment analysis have attracted significant attention in recent years. Lots of sentiment lexicon generation methods have been proposed. However, the generation of domain-specific lexicon with unlabeled data has not ...
Extracting domain-specific opinion words for sentiment analysis
MICAI'12: Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part IIIn this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, ...
Comments