Abstract
Sentiment analysis aims to extract emotions from a broad set of data. This paper studies the impact of lexical resource enrichment on Arabic Sentiment Analysis. At first and as there is a lack of Arabic lexical resources in the field of sentiment analysis, we build new resources and use several lexicon construction methods. The first method is manual and it lies in extracting sentimental words from a selected dataset and the second is semi-automatic and based on translating an English lexicon into Arabic followed by a manual check. Both methods generate terms in word form. Besides the mentioned resources, the paper enriches an existing resource that contains terms related to four specific domains by creating its equivalent lemmatized version. Following various methods, we created lexicons with different morphologies to enrich the existing Arabic resources. Subsequently, these resources are used in developing a polarity classifier. The paper explains the followed steps to construct the different lexical resources, defines the pre-processing levels and gives statistics related to each lexicon. Then, we present the classification approaches we used to determine the polarity of the new data. In order to perform in depth analysis of the results in correspondence to the extracted features, we opt for the unsupervised and the supervised approaches that help to have a clear view on their internal architecture and process. The experiments are based on features alteration, besides opting for a feature selection approach in order to keep the most pertinent features and reduce the characteristic vector size. Moreover, we perform an in depth analysis of the characteristic vectors and corpus nature and we explain the main causes behind results improvement and degradation. The results of the tests carried out show the relevance of each component of the system.
Similar content being viewed by others
Notes
References
Abdulla, N. A., Ahmed, N. A., Shehab, M. A., Al-Ayyoub, M. (2013). Arabic sentiment analysis: Lexicon-based and corpus-based. In 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). IEEE, pp. 1–6. https://doi.org/10.1109/AEECT.2013.6716448
Abdulla, N. A., Ahmed, N. A., Shehab, M. A., Al-Ayyoub, M., Al-Kabi, M. N., & Al-rifai, S. (2014). Towards improving the lexicon-based approach for Arabic sentiment analysis. International Journal of Information Technology and Web Engineering, 9, 55–71. https://doi.org/10.4018/ijitwe.2014070104.
Abdul-Mageed, M., & Diab, M. T. (2012). Toward building a large-scale Arabic sentiment Lexicon. Proc. 6th Int. Glob. WordNet Conf. 18–22.
Al-Moslmi, T., Albared, M., Al-Shabi, A., Omar, N., & Abdullah, S. (2018). Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis. Journal of Information Science, 44, 345–362. https://doi.org/10.1177/0165551516683908.
Al-Sallab, A., Baly, R., Hajj, H., Shaban, K. B., El-Hajj, W., & Badaro, G. (2017). AROMA: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing, 16, 1–20. https://doi.org/10.1145/3086575.
Al-twairesh, N., Al-khalifa, H., Al-salman, A., 2016. AraSenTi : Large-Scale Twitter-Specific Arabic Sentiment Lexicons. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 697–705).
Aly, M., Atiya, A., 2013. LABR: A Large Scale Arabic Book Reviews Dataset. 51st Annu. Meet. Assoc. Comput. Linguist. 494–498.
Badaro, G., Baly, R., Hajj, H., Habash, N., El-Hajj, W., 2014. A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). Association for Computational Linguistics, pp. 165–173. https://doi.org/10.3115/v1/W14-3623.
Baly, R., Badaro, G., El-Khoury, G., Moukalled, R., Aoun, R., Hajj, H., El-Hajj, W., Habash, N., & Shaban, K. (2017). A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the Third Arabic Natural Language Processing Workshop. Association for Computational Linguistics, pp. 110–118. https://doi.org/10.18653/v1/W17-1314.
Baly, R., Khaddaj, A., Hajj, H., El-hajj, W., & Shaban, K. B. (2014). ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic levantine tweets. arXiv preprint arXiv:1906.01830.
Boudchiche, M., & Mazroui, A. (2019). A hybrid approach for Arabic lemmatization. International Journal of Speech Technology, 22, 563–573. https://doi.org/10.1007/s10772-018-9528-3.
Boudchiche, M., Mazroui, A., Bebah, M. O. A. O., Lakhouaja, A., & Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University-Computer and Information Sciences, 29(2), 141–146.
Duwairi, R., & El-Orfali, M. (2014). A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. Journal of Information Science, 40, 501–513. https://doi.org/10.1177/0165551514534143.
Duwairi, R. M., Ahmed, N. A., & Al-Rifai, S. Y. (2015). Detecting sentiment embedded in Arabic social media: A lexicon-based approach. Journal of Intelligent & Fuzzy System, 29, 107–117. https://doi.org/10.3233/IFS-151574.
Duwairi, R .M., Qarqaz, I. (2014). Arabic sentiment analysis using supervised classification. In 2014 2nd International Conference on Future Internet of Things and Cloud (FiCloud). IEEE, pp. 579–583. https://doi.org/10.1109/FiCloud.2014.100
Elnagar, A., Khalifa, Y. S., & Einea, A. (2018a). Hotel Arabic-reviews dataset construction for sentiment analysis applications. In K. Shaalan, A. E. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (pp. 35–52). Cham: Springer International Publishing.
Elnagar, A., Lulu, L., & Einea, O. (2018b). An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis. Procedia Computer Science, 142, 182–189. https://doi.org/10.1016/j.procs.2018.10.474.
ElSahar, H., El-Beltagy, S.R., 2015. Building large arabic multi-domain resources for sentiment analysis. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 23–34). Springer, Cham.
Eskander, R., & Rambow, O. (2015). SLSA: A Sentiment Lexicon for Standard Arabic. In Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp. 2545–2550. https://doi.org/10.18653/v1/D15-1304
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach.
Ibrahim, H. S., Abdou, S. M., & Gheith, M. (2015a). Sentiment analysis for modern standard Arabic and colloquial. International Journal on Natural Language Computing, 4, 95–109. https://doi.org/10.5121/ijnlc.2015.4207.
Ibrahim, H. S., Abdou, S. M., & Gheith, M. (2015b). Automatic expandable large-scale sentiment lexicon of modern standard Arabic and Colloquial. In 2015 first international conference on Arabic computational linguistics (ACLing). IEEE, pp. 94–99. https://doi.org/10.1109/ACLing.2015.20
Karoui, J., Zitoune, F. B., & Moriceau, V. (2017). SOUKHRIA: Towards an irony detection system for Arabic in social media. Procedia Computer Science, 117, 161–168. https://doi.org/10.1016/j.procs.2017.10.105.
Krouska, A., Troussas, C., & Virvou, M. (2016). The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International conference on information, intelligence, systems & applications (IISA). IEEE, pp. 1–5. https://doi.org/10.1109/IISA.2016.7785373
Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer. Proc. 14th Int. Conf. World Wide Web - WWW 05 342. https://doi.org/10.1145/1060745.1060797
Mahyoub, F. H. H., Siddiqui, M. A., & Dahab, M. Y. (2014). Building an Arabic sentiment lexicon using semi-supervised learning. Journal of King Saud University – Computer and Information Sciences, 26, 417–424. https://doi.org/10.1016/j.jksuci.2014.06.003.
Mohammad, S., Salameh, M., & Kiritchenko, S. (2016). Sentiment lexicons for Arabic social media. In Proceedings of the tenth international conference on language resources and evaluation (LREC'16) (pp. 33–37)
Nabil, M., Aly, M., & Atiya, A. (2015). ASTD: Arabic Sentiment Tweets Dataset. In: Proceedings of the 2015 conference on empirical methods in natural language processing. association for computational linguistics, pp. 2515–2519. https://doi.org/10.18653/v1/D15-1299
Oussous, A., Lahcen, A. A., & Belfkih, S. (2019). Impact of Text Pre-processing and Ensemble Learning on Arabic Sentiment Analysis. In: The 2nd International Conference. ACM Press, pp. 1–9. https://doi.org/10.1145/3320326.3320399
Soumeur, A., Mokdadi, M., Guessoum, A., & Daoud, A. (2018). Sentiment analysis of users on social networks: Overcoming the challenge of the loose usages of the Algerian Dialect. Procedia Computer Science, 142, 26–37. https://doi.org/10.1016/j.procs.2018.10.458.
Tubishat, M., Abushariah, M. A. M., Idris, N., & Aljarah, I. (2019). Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence, 49, 1688–1707. https://doi.org/10.1007/s10489-018-1334-8.
Youssef, M., & El-Beltagy, S. R. (2018). MoArLex: An Arabic sentiment lexicon built through automatic lexicon expansion. Procedia Computer Science, 142, 94–103. https://doi.org/10.1016/j.procs.2018.10.464.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Touahri, I., Mazroui, A. Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches building. Int J Speech Technol 24, 109–126 (2021). https://doi.org/10.1007/s10772-020-09758-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09758-z