Negation handling for Amharic sentiment classification

User generated content is bringing new aspects of processing data on the web. Due to the advancement of World Wide Web technology, users are not only consumer of web contents but also they are producers of contents in the form of text, audio, video and picture. This study focuses on the analysis of textual contents with subjective information (re-ferring to sentiment analysis). Most of conventional approaches of sentiment analysis do not effectively capture negation in languages where there are limited computational linguistic resources (e.g. Amharic). For this research, we proposed Amharic negation handling framework for Amharic sentiment classification. The proposed framework combines the lexicon based sentiment classification approach and character ngram based machine learning algorithms including LR(logistic regression) and NB(Naïve Bayesian). Finally, the performance of framework is evaluated using the annotated Amharic news comments. The sys-tem is performing the best of all models and the baselines with accuracy of 98.0. The result is compared with the base-lines (without negation handling and word level ngram model).


Introduction
Users usually express their feelings, emotions and opinions as comments in response to the posted news, photo, audio and video. Currently, opinionated sources are increasing in languages other than English. However, Amharic sentiment analysis researches are very few as it has no sufficient linguistic resources for linguistic preprocessing and sentiment analysis. There are several challenges in lexicon based sentiment analysis. One of these is that handling negation in the text. The most common approach for negation handling is carried out relying on negation keywords. However, it is complex to identify the scope of negation Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
where the process of correctly identifying the part of the text affected by the presence of negation word. Negation handling in Amharic language has not been studied previously. Thus, this research develops an automatic method to handle negation and combined with char ngram features for Amharic sentiment classification relying on these linguistic features. The research questions to be addressed in this work are as follows: (a) how can we automatically detect negation words in Amharic texts? (b) how can we design a framework for handling negation in Amharic sentiment analysis? (c) how to capture char level ngram features for improving Amharic sentiment analysis in Social media(e..g. Facebook) and (d) how can we measure the performance of the proposed framework?

Related works
In this section, we briefly present the key related works. Asmi et. al. Asmi & Ishaya (2012) proposed an approach that can detect negation and considers scope of negation relying on syntactic dependency for sentiment analysis. In Amalia et al. (2018), proposed rule based negation handling and its scope based on syntactic parsing of Indonesian language for machine learning based sentiment analysis in twitter and the result of support vector machine performs well as compared with other experiments. The F-Score on two Twitter data sets is improved by 1.79% and 2.69% from the existing baseline without negation handling. In Farooq et al. (2017), develop negation scope handling strategies relying on the linguistic features of the language. The accuracy of negation scope identification is 83.3% which outperforms very well above the baseline. In Diamantini et al. (2016), develops negation scope detection relying on dependency parsing tree and semantic disambiguation technique and evaluated on integrated social networks in real time. The performance of negation handling is outperforming with accuracy of 6% more than the baseline. The work in Heerschop et al. (2011) develops nega-tion handling relying on wordbank creation and document sentiment scoring for sentiment analysis and outperforms the human rating by an increase precision with 1.17%. In Enger et al. (2017), developed negation cue and scope handling open source tool relying on dependency parser, negation clue(negation lists, prefix, suffix) as input to machine learning(e.g. SVM) and its performance is slightly unchanged from the baseline.

Proposed approach
The proposed framework consists of components including preprocessing and sentiment score calculation using negation detection and machine learning using char level ngrams features. The proposed framework is shown in fig.  1.

Fig. 1 Proposed Amharic negation handling framework
To compute sentiment score using negation detection, for each Amharic news comment, Ci, if each stemmed word wij is found in either of the Amharic Sentiment lexicons (Manual, SOCAL, SWN) [Neshir et al, 2019], then the sentiment score sij is retrieved. sij and its position index in the comment is stored. To compute the sentiment of the comment, we apply positional weighting inversion if the comment contains any negation clue. If negation clue is not found, the score of the word is simply added. The negation handling algorithm is presented in listing 1.

Results and discussions
For evaluating the proposed framework, we used the datasets which consist of 2705 sentence/phrase level sentiment annotated facebook news users' comments collected from the Government Office Affairs Communication (GOAC) between 2008 and 2010. We also used the Amharic sentiment lexicons including manual(1000), SWN(13679) and SOCAL(5683) [ Neshir et al. , 2019]. The proposed negation handling approach for Amharic sentiment classification outperforms very well when we compared to the char ngram based machine learning classifiers. The char ngram based machine learning is promising that it reduces the demand for linguistic resources for less dominant languages. The results are presented in Table 1 shown below. The hybrid approach (NH + LR+NB) outperforms the best compared to the individual approach for sentiment classification on Amharic face book news comment texts with accuracy of 98.0.

Conclusions and recommendations
In general, extensive linguistic resources are expensive to build sentiment classification on the less dominant languages (e.g. Amharic). To reduce this problem, we proposed negation handling approach and char ngram approach for Sentiment analysis of Amharic face book news comments. We evaluated the usefulness of the combination of Negation Handling (NH) and character level ngram based machine learning models for sentiment classification of Amharic facebook news comments. We call the combination (i.e. hybrid) of rule based NH and machine learning algorithms (logistic regression and Naïve Bayesian) using char ngram based tfidf features for Amharic sentiment classification. The proposed approaches are evaluated by measuring accuracy of individual and their combinations for Amharic text sentiment classification. Amharic negation scope identification and handling is recommended for further researches. We also suggest method to consider char ngram embedding features from corpus of the same domain(e.g. Facebook news comments).