research-article

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

Authors:
Mohammad Ehsan Basiri

Shahrekord University, Iran

Shahrekord University, Iran

0000-0003-2893-3892
View Profile

,
Arman Kabiri

Shahrekord University, Iran

Shahrekord University, Iran
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17 Issue 4Article No.: 26pp 1–18https://doi.org/10.1145/3195633

Published:29 May 2018Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

A corrigendum was issued for this article on November 29, 2018. You can download the corrigendum from the source materials section of this citation page.

Abstract

Lexicon-based sentiment analysis (SA) aims to address the problem of extracting people’s opinions from their comments on the Web using a predefined lexicon of opinionated words. In contrast to the machine learning (ML) approach, lexicon-based methods are domain-independent methods that do not need a large annotated training corpus and hence are faster. This makes the lexicon-based approach prevalent in the SA community. However, the story is different for the Persian language. In contrast to English, using the lexicon-based method in Persian is a new discipline. There are rather limited resources available for SA in Persian, making the accuracy of the existing lexicon-based methods lower than other languages. In the current study, first an exhaustive investigation of the lexicon-based method is performed. Then two new resources are introduced to address the problem of resource scarcity for SA in Persian: a carefully labeled lexicon of sentiment words, PerLex, and a new handmade dataset of about 16,000 rated documents, PerView. Moreover, a new hybrid method using both ML and the lexicon-based approach is presented in which PerLex words are used to train the ML algorithm. Experiments are carried out on our new PerView dataset. Results indicate that the accuracy of PerLex is higher than the existing CNRC, Adjectives, SentiStrength, PerSent, and LexiPers lexicons. In addition, the results show that using PerLex significantly decreases the execution time of the proposed system in comparison to the above-mentioned lexicons. Moreover, the results demonstrate the excellence of using opinionated lexicon terms followed by bigrams as the features employed in the ML method.

Supplemental Material

Available for Download

pdf

a26-basiri-corrigendum.pdf (33.9 KB)

Corrigendum to "Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining," by Basiri et al., ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) Volume 17, Issue 4, Article No. 26.

References

Digikala. 2017. Home Page. Retrieved March 23, 2018, from http://www.digikala.com.Google Scholar
Saeedeh Alimardani and Abdollah Aghaei. 2015. Opinion mining in Persian language using supervised algorithms. Journal of Information Systems and Telecommunication 3, 3, 1--7.Google Scholar
Fatemeh Amiri, Simon Scerri, Mohammad H. Khodashahi, Fraunhofer Iais, and Sankt Augustin. 2015. Lexicon-based sentiment analysis for Persian text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 9--16.Google Scholar
Ehsan Asgarian, Reza Saeedi, Ahmad Stiri, Behdad Bahmadi, and Hadi Ghaemi. {n. d.}. NLPTools. Available at https://wtlab.um.ac.ir.Google Scholar
Ayoub Bagheri, Mohamad Saraee, and Franciska de Jong. 2013. Sentiment classification in Persian: Introducing a mutual information-based method for feature selection. In Proceedings of the 2013 21st Iranian Conference on Electrical Engineering (ICEE’13). IEEE, Los Alamitos, CA, 1--6.Google ScholarCross Ref
Ayoub Bagheri and Mohamad Saraee.2014. Persian sentiment analyzer: A framework based on a novel feature selection method. International Journal of Artificial Intelligence 12, 2, 115., http://www.scopus.com/inward/record.url?eid=2-s2.0-84926213301&partnerID===408md5=69f8a916da14f0362bc2cbded411a2f3 (2014), 115--129Google Scholar
Mohammad Basiri, Ahmad Nilchi, and Nasser Ghassem-Aghaee. 2014. A framework for sentiment analysis in persian. Open Transactions on Information Processing 1, 3, 1--14.Google ScholarCross Ref
Mohammad Ehsan Basiri, Nasser Ghasem-Aghaee, and Ahmad-Mohamad SaraeeReza Naghsh-Nilchi. 2014. Exploiting reviewers’ comment histories for sentiment analysis. Journal of Information Science 40, 3, 313--328. Google ScholarDigital Library
Mohammad Ehsan Basiri and Arman Kabiri. 2017. Sentence-level sentiment analysis in Persian. In Proceedings of the 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA’17). IEEE, Los Alamitos, CA, 84--89.Google ScholarCross Ref
Mohammad Ehsan Basiri, Ahmad Reza Naghsh-Nilchi, and Nasser Ghasem-Aghaee. 2014. Sentiment prediction based on Dempster-Shafer theory of evidence. Mathematical Problems in Engineering 2014, 1--13. http://www.hindawi.com/journals/mpe/2014/361201/abs/.Google ScholarCross Ref
Farah Benamara, Sabatier Irit, Carmine Cesarano, Napoli Federico, and Diego Reforgiato. 2007. Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the International Conference on Weblogs and Social Media. 1--4. DOI:https://doi.org/citeulike-article-id:9387439Google Scholar
Erik Cambria, Bjorn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28, 2, 15--21. Google ScholarDigital Library
Kia Dashtipour, Amir Hussain, Qiang Zhou, Alexander Gelbukh, Ahmad YAHawalah, and Erik Cambria. 2016. PerSent: A freely available persian sentiment lexicon. In Advances in Brain Inspired Cognitive Systems: 8th International Conference (BICS’16). Springer, 310--320.Google ScholarCross Ref
Andrea Ceron, Luigi Curini, and Stefano M. Iacus. 2015. Using sentiment analysis to monitor electoral campaigns: Method matters—evidence from the United Sates and Italy. Social Science Computer Review 33, 1, 3--20. Google ScholarDigital Library
Andrea Ceron, Luigi Curini, Stefano M. Iacus, and Giuseppe Porro. 2014. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media and Society 16, 2, 340--358.Google ScholarCross Ref
Effat Golpar-Rabooki, Saghi-Al-Sadat Zarghamifar, and Jalal Rezaeenour. 2015. Feature extraction in opinion mining through Persian reviews. Journal of Artificial Intelligence and Data Mining 3, 2, 169--179.Google Scholar
Mohammad Sadegh Hajmohammadi and Roliana Ibrahim. 2013. A SVM-based method for sentiment analysis in Persian language. In Proceedings of SPIE 8768: International Conference on Graphic and Image Processing (ICGIP’12). 876838.Google ScholarCross Ref
Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1, 1--167. Google ScholarCross Ref
Bing Liu. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions.Cambridge University Press.Google ScholarCross Ref
Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4, 1093--1113.Google ScholarCross Ref
Shahla Nemati and Ahmad Reza Naghsh-Nilchi. 2016. Incorporating social media comments in affective video retrieval. Journal of Information Science 42, 4, 524--538. Google ScholarDigital Library
Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1--2, 1--135. Google ScholarDigital Library
Bo Pang, Lillian Lee, Harry Rd, and San Jose. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 79--86. Google ScholarDigital Library
Mohamad Saraee and Ayoub Bagheri. 2013. Feature selection methods in Persian sentiment analysis. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 7934. Springer, 303--308.Google Scholar
Kim Schouten and Flavius Frasincar. 2016. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 28, 3, 813--830. Google ScholarDigital Library
Glenn Shafer. 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.Google Scholar
Mohammadreza Shams, Azadeh Shakery, and Heshaam Faili. 2012. A non-parametric LDA-based induction method for sentiment analysis. In Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 216--221.Google ScholarCross Ref
Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 90--94. Google ScholarDigital Library
Venkatramana S. Subrahmanian and Diego Reforgiato. 2008. AVA: Adjective-verb-adverb combinations for sentiment analysis. IEEE Intelligent Systems 23, 4, 43--50. Google ScholarDigital Library
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2, 267--307. Google ScholarDigital Library
Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology 63, 1, 163--173. Google ScholarDigital Library
Mike Thelwall, Kevan Buckley, George Paltoglou, Marcin Skowron, David Garcia, Stephane Gobron, Junghyun Ahn, Arvid Kappas, Dennis Küster, and Janusz A. Holyst. 2013. Damping sentiment analysis in online communication: Discussions, monologs and dialogs. In Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, Vol. 7817. Springer, 1--12. Google ScholarDigital Library
Xiaohui Yu, Yang Liu, Xiangji Huang, and Aijun An. 2012. Mining online reviews for predicting sales performance: A case study in the movie domain. IEEE Transactions on Knowledge and Data Engineering 24, 4, 720--734. Google ScholarDigital Library
Wenhao Zhang, Hua Xu, and Wei Wan. 2012. Weakness finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems With Applications 39, 11, 10283--10291. Google ScholarDigital Library
Mohammad SM, Kiritchenko S, Zhu X. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets. Atlanta, Georgia, USA. 2013 Jun 14:321.Google Scholar
Sabeti B, Hosseini P, Ghassem-Sani G, Mirroshandel SA. LexiPers: An ontology-based sentiment lexicon for Persian. InGCAI 2016 (pp. 329-339).Google Scholar

Index Terms

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining
1. Information systems

Recommendations

Mining slang and urban opinion words and phrases from cQA services: an optimization approach
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Current opinion lexicons contain most of the common opinion words, but they miss slang and so-called urban opinion words and phrases (e.g. delish, cozy, yummy, nerdy, and yuck). These subjectivity clues are frequently used in community questions and are ...
Read More
Generate domain-specific sentiment lexicon for review sentiment analysis

Lexicon-based approaches for review sentiment analysis have attracted significant attention in recent years. Lots of sentiment lexicon generation methods have been proposed. However, the generation of domain-specific lexicon with unlabeled data has not ...
Read More
Extracting domain-specific opinion words for sentiment analysis
MICAI'12: Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II

In this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 17, Issue 4
December 2018
193 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3229525
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 May 2018
- Accepted: 1 March 2018
- Revised: 1 January 2018
- Received: 1 October 2017
Published in tallip Volume 17, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PerView dataset
Persian language
Sentiment analysis
lexicon-based approach
machine learning
opinion mining
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 667
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Mining slang and urban opinion words and phrases from cQA services: an optimization approach

Generate domain-specific sentiment lexicon for review sentiment analysis

Extracting domain-specific opinion words for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Mining slang and urban opinion words and phrases from cQA services: an optimization approach

Generate domain-specific sentiment lexicon for review sentiment analysis

Extracting domain-specific opinion words for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media