Sentiment Analysis for Arabic Social Media News Polarity

In recent years, the use of social media has rapidly increased and developed significant influence on its users. In the study of the behavior, reactions, approval, and interactions of social media users, detecting the polarity (positive, negative, neutral) of news posts is of considerable importance. This proposed research aims to collect data from Arabic social media pages, with the posts comprising the main unit in the dataset, and to build a corpus of manually-processed data for training and testing. Applying Natural Language Processing to the data is crucial for the computer to understand and easily manipulate the data. Therefore, Stop-Word removal, Stemming, and Normalization are applied. Several classifiers, such as Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Random Frost, and Decision Tree are used to train the dataset, and their accuracy is determined by data testing. These two steps are carried out using the open-source WEKA tool. As a result, each post is categorized into three different classes: positive, negative, and neutral. This research concludes that among the classifiers, SVM reaches the highest level of accuracy with a percentage of 83% for the F1-measure.


Introduction
This section is categorized into five subsections: Arabic language, Arabic Sentiment Analysis, Classification Techniques, Arabic Natural Language Processing (NLP), and Evaluation.

Arabic Language
The most common tool that allows people to communicate, talk, discuss, write, and express feelings and ideas is language. The world has thousands of languages, and undoubtedly, each country has its own official language. One of these is Arabic, a Semitic language, in the same family as Hebrew and Aramaic. Approximately 260 million people use Arabic as their first language, and more understand it as a second language. Arabic has its own alphabet, which is written from right to left, similar to Hebrew. Given its wide usage throughout the world, Arabic is one of the six official languages of the United Nations, along with English, Spanish, French, Russian, and Chinese. Many countries have Arabic as an official language, although it is not spoken the same way. This language has many dialects, or varieties, such as Modern Standard, Egyptian, Gulf, Moroccan, and Levantine.
Several of these dialects widely differ, which presents difficulties for the speakers to understand each other. The Arabic language contains 28 letters, each with varying forms and spelling depending on its position in the word. For instance, the letter ‫,)ﺽ(‬ pronounced as (Dhad), is written as ( ‫ﺿ‬ ‫ـ‬ ) at the beginning of the word, as ( ‫ـ‬ ‫ـ‬ ‫ﻀ‬ ‫ـ‬ ) between two letters, and as ( ‫ـ‬ ‫ﺾ‬ ) at the end of the word [1].
The Arabic language has a classical form, that is, Modern Standard Arabic (MSA). In the Arab world, MSA is the language of the Holy Quran, books, news, official publications, and journals. However, in daily life, a slang form of the language, known as "street language", is used when communicating with each other. For example, Jordanian people have their dialect and specific verbal communication. In the Arabian Gulf, citizens use the Khaliji dialect. People in Lebanon and Syria speak the Levantine dialect, also called (Shamii), whereas those in Libya, Tunisia, Algeria, and Morocco speak the Maghrebi dialect. Similarly, Sudanese people have a unique dialect [1].

Arabic Sentiment Analysis
The goal of sentiment analysis is to determine the attitudes of a group of people using one or more than one platform. In recent years, social media posts have rapidly increased. Specifically, Arabic social media news posts have developed considerable influence on social media users. On this basis, the behavior, reaction, acceptance, and interaction of social media users are observed and analyzed. Furthermore, the analysis to make our proposed system helps users to become more relaxed and allow relevant organizations to be more familiar with each other. Later on, the behaviors, reactions, approval, and interaction are mentioned as the user opinion in this research [1].
NLP has many types, one of which is sentiment analysis that is used for studying human language through logical computational probabilities. The main goal of using sentiment analysis is to classify the documents and describe their polarity, whether positive, negative, or neutral.
One of the greatest challenges of sentiment analysis is data gathering because of the huge data set required to obtain the results [2]. For this reason, many application programming interfaces (API) have emerged. The central key of these APIs is to easily collect datasets from different sources. APIs are defined as a group of applications containing a communication protocol that plays the main role in collecting data to build sentiment analysis or any other software that requires the creation of a huge amount of data [3]. Sentiment analysis has many benefits, especially in business. Mostly, sentiment analysis helps in the fields of business intelligence application and recommender systems [4]. One of these benefits is monitoring the social media pages of trademark brands and companies through analysis of the reactions, comments, feedback, and contributions of social media users.
The goal of sentiment analysis is to extract or predict the polarity of people's behavior, reactions, approval, and interactions.
Considerable interest has been recently paid on sentiment analysis to help predict the polarity of Internet content in many areas. The majority of systems is built for English and European languages rather than Arabic. Therefore, in this research, sentiment analysis is applied on social media posts and the news are classified into positive, negative, and neutral by building several classifiers, such as Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbor (K-NN), and Decision Tree (DT). Accordingly, the classification is carried out using the WEKA Toolan open-source machine learning software.

Classification Techniques
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each item in the data. In machine learning, the three types of classification techniques are supervised, unsupervised, and semi-supervised. The first type is used in this research.

Arabic Natural Language
Computer science combines several fields and has numerous subfields, such as Natural Language and Artificial Intelligence. These two significant subfields are related to the interactions that occur between computer and human (natural) languages, particularly in programming computers to enable the processing and analysis of large amounts of natural language data. Common challenges in the NLP include speech recognition, natural language understanding, and natural language generation.
Algorithms are used to enable computers to identify and extract the natural human languages. These algorithms convert natural language that is considered unstructured into a specific form, which the computer can understand.

Evaluation
Considerable research has focused on Sentiment Analysis and NLP. However, the most suitable and most accurate technique is yet to be identified. Therefore, evaluation measures are created and reach over nine types. The present research discusses the three most accredited and well-known measures, namely, Recall, Precision, and F-Score. Each measure has a formula and is developed through Information Retrieval. These formulas provide and present the accuracy percentage when they are solved and calculated. Thus, the accuracy percentage is the key determinant of the suitability of the given algorithm.

Literature Review
English is considered the dominant language of science in the world, and most studies and research are written as such. Recently, sentiment analysis has been applied on other languages, such as Arabic.
Sentiment analysis between English and Arabic show considerable difference. Fig. 1 shows that more research has been done on sentiment analysis of English compared with Arabic [5]. Arabic still does not have a sufficient number of corpora. Fig. 2 shows the 10 most dominant languages on the Internet.
A machine learning software called RapidMiner [6] is developed to support processing statement that is available in the Arabic language. According to RapidMiner, SVM and NB classification techniques are applied. SVM classifiers have better performance than NB when merged with stemming and TF-IDF scheme using Bigrams.
From Twitter, 10,006 tweets from Egyptian pages have been collected and then divided into four parts, as follows: Positive tweets (799); Negative tweets (1,684); Neutral tweets (832); and finally, Objective tweets (6,691). SVM and NB are used as classification techniques. NLP is applied and a deep learning model is used for opinion mining, Recursive Neural Tensor Network (RNTN), which is trained by a non-Twitter corpus and achieves better performance. Thus, the importance of lemmatization for handling the complexity and lexical sparsity of Arabic is confirmed [7].
API is used to collect data from Twitter [8] given that the real-time collection of each tweet is required. Subsequently, the pre-processing of data is applied to remove emails and hyperlinks. Collected from January 20 to February 21, 2014, the data set contains (7,503) tweets for training. Tweets with English words written using the Arabic alphabet posed a challenge for this study. A total of (1,365) data points are collected for testing. Then, sensitive and sentiments analysis (SSA) is built for Arabic Twitter feeds, which was the first in the Arabic language.
Pre-processing of Arabic language using supervised machine learning has been applied to assess polarity, specifically for the Saudi dialect. Redundant tweets and duplicates are initially removed to exclude all unnecessary data and redundant letters (e.g., duplicate letters). A total of 4,000 tweets are collected. Five human annotators built the corpus. The polarity of each tweet is labelled, and a Bag of Words (BOW) is established. Therefore, the classification techniques used in this research are SVM and NB. The use of BOW is proven to enhance the accuracy of the analysis [9].
The corpus is built using 2000 Arabic statements, divided into 1000 MSA from Twitter, www.booking. com and www.ejabat.com. For the data collection, the API is used for Twitter from June 2012. In addition, 10k tweets and 10k comments and reviews in Arabic are collected. SVM is applied as a classification technique and the data are divided into 80% for training, 10% for developing, and 10% for testing. Furthermore, the results are compared before and after using lexicon expansion, which shows a positive effect on the classification [10].
A dataset that contains 4000 tweets is collected and shows that the accuracy increased by adding a BOW that consists of names of popular people in the area to all applied techniques; results show that the SVM achieves 98% accuracy and is therefore considered to be the best classifier [11].
Identification of irony in Arabic statements is attempted. A total of 2000 tweets are collected and labelled as positive and 4,783 as negative. Two classification techniques are used, namely, SVM and NB under WEKA tool. The collected data contain tweets with mentions of various famous people, such as H. Clinton, M. Morsi, D. Trump, or A. Alsissi, using an API for Twitter. Following that, several features such as surface, sentiment, shifter, and contextual were applied. The application is intended to help the classifiers in the detection and increase their accuracy, which is successfully achieved. Each technique shows a 72.36% accuracy [12].
Two classification techniques, SVM and NB, are used. The data are collected from Twitter using API. After data collection, normalization is applied to reduce the data size. If a short text is available, then the unigram is used. Unigram can help the machine-learning algorithm to detect data patterns, and thus, is more effective [13].
Another research used five steps (data collection, pre-processing, classifications, clustering, and summarization) to collect data using a Twitter API and concluded that further features need to be introduced to enhance the detection [14]. NB is used as the classification technique, trained and tested using the WEKA toolkit.
The procedure is also carried out in two phases [15]. The first one is pre-processing the data and deleting unnecessary entities, such as mentions and hyperlinks. The second phase is constructing a feature set from the data to use in classification with SVM, NB, and KNN. F-Measure is used to provide a score for each word. The data is collected from Twitter using API between April 26 and June 1, 2014. The classifiers are trained and tested using the WEKA tool. Results show that the achievement by (Part Of Speech (PoS) is not significant. Notwithstanding, using Twitter has become widespread and now includes various Arabic dialects.

Procedures and Methodology
Sentiment analysis is classified into three central fields, namely, lexicon, tools, and lexicon tools. Lexicon is defined as the words, phrases, meanings, and patterns that can be used to express subjectivity. Tools contain different types of classifiers that use text classification algorithms. NLP tools include Stemmer and Tagger, among others [16]. The fundamental part of tools is the corpora, which includes the annotated data with its polarity. The classification algorithm uses these corpora to analyze new content.
In this study, BOW that can be considered as a dictionary is built for sentiment analysis, specifying the words, phrases, and patterns used in the language. Thus, anything related to sentiment analysis must have a corpus [7].
Unlike Arabic, English is relatively consistent, whether slang or standard. Arabic has numerous dialects and is not as comprehensively studied. Unfortunately, Arabic Sentiment Corpus is limited. Therefore, relevant data that can help in this study on sentiment analysis is insufficient. Most available research regarding the Arabic Corpus only involves one topicmovie reviews. Furthermore, these studies include inadequate data and mostly require purchase. As a result, specialists had to develop a new Arabic Sentiment Corpus to complete this study. Notably, the data of the produced corpus are collected from Arabic social media news pages, such as Facebook and Twitter, which might be local or global.
Contrary to possible initial assumptions, building a corpus for Arabic is challenging and complicated. Overall, building a corpus is carried out in several steps, starting with the data gathering and ending with pre-processing.

Data Preparation
The dataset in this corpus is gathered from 15 different news pages. The dataset currently contains 6,138 posts and tweets from Facebook and Twitter of two different domains, local and international. Worthy of mention is that the comments and retweets are collected from the same social media.
The dataset is annotated and categorized into three polarity categories, namely, positive, negative, and neutral. Hence, this dataset is gathered to build the corpus.

Data Annotation
Three student groups from the computer science division at Amman Arab University are chosen to classify the items in the dataset into positive, negative, or neutral. After the annotation procedure is completed, the dataset is sorted into folders and text files. The corpus is openly and freely accessible for research and analysts.

Cleaning Dataset
The raw dataset is not 100% pure and requires additions, such as affixes. The Arabic dataset collected from online sources may contain special characters, non-Arabic words, non-Arabic letters, numbers, symbols, or elongated words. As necessary, all non-Arabic letters and HTML links are removed from the dataset.
The second process focused on special characters. Every so often, people use these special characters to draw emoticons, such as sad ":(" or smiley faces ":)", which are considered a short sentiment. However, people occasionally use them for no reason or by mistake [17].
Any special character used to represent any other emoticon should not be removed from the text because of its powerful meaning that affects the polarity of the text. Nevertheless, special characters that do not have any importance or meaning are removed.
In this step, a good practice is to build a list that contains emoticons, represented by special characters, to be used as a guide [18]. Moreover, at times numbers have the power to express a feeling or sentiment. Accordingly, these numbers are included as parts of the text [19][20][21][22][23][24].
Word elongation refers to the addition of extra letters in a word for emphasis. An example is "I looooove Jordan", which implies that the user is emphasizing their feelings by repeating the letter "O" in the word "love". Word elongation is also used in Arabic. Consequently, that iteration affects the processing steps; therefore, all redundant letters in the word should be removed. See Tab. 1.

Results and Analysis
This study aims to extract the polarity from Arabic posts on social media, such as Facebook and Twitter, by applying classification algorithms. The accuracy of this dataset classification is measured by Precision, Recall, and F1-Measure. Five classification algorithms are used, namely, SVM, NB, K-NN, DT, and RF.
The classifiers are trained on the collected dataset. Three groups of people divided the dataset into three classifications, namely, positive, negative, and neutral.
The classification algorithm is trained using a Cross-Validation tool. Notably, the dataset is divided into 80% for training and 20% for testing.

K-NN
Tab. 2 shows the accuracy measures of K-NN and Fig. 3 presents the results of K-NN classifier by Recall, Precision, and F1-Measure. The classifier categorizes the posts as Positive, Negative, and Neutral. The dataset was pre-processed by applying Arabic NLP tools, such as Normalization, Stop Word Removal, and Stemming.

DT
Tab. 3 shows the decision tree results, which presents the accuracy measures of DT, and Fig. 4 shows the DT classifier accuracy results, measured by Recall, Precision, and F1-Measure. The classifier categorizes the posts as Positive, Negative, and Neutral.     Figure 4: DT classifier results  Figure 5: SVM classifiers results

RF
Tab. 5 displays the accuracy measures of RF and Fig. 6 shows the accuracy of the RF classifier, measured by Recall, Precision, and F1-Measure. The classifier categorizes the posts as Positive, Negative, and Neutral.

NB
Tab. 6 shows the accuracy measures of NB and Fig. 7 displays the NB classifier accuracy, measured by Recall, Precision, and F1-Measure. The classifier categorizes the dataset posts into three classes as Positive, Negative, and Neutral.  Figure 6: RF classifier results  Figure 7: NB classifier results

Discussion
Tab. 7 shows the Recall, Precision, and F1-Measures of the five classifiers: KNN, DT, SVM, RF, and NB. Fig. 8 shows the Recall results of the five used classifiers. By comparison, the RF classifier shows the highest Recall accuracy while K-NN has the lowest Recall accuracy. These results are compatible with previous research. Fig. 9 illustrates the precision results of the five classifiers. By comparison, SVM shows the highest Precision accuracy while K-NN has the lowest precision accuracy. These results are compatible with previous research.

Conclusion
This study proposes sentiment analysis and extraction of the polarity from news pages by applying classification. The dataset is collected from news social media websites, such as Facebook and Twitter. A social medium is an open environment and allows people to express their opinions, which spread through different behaviors and beliefs. A dataset is collected and categorized into three classes, namely, Positive, Negative, and Neutral.
We propose the creation of news pages on social media to extract the polarity from the dataset. The dataset is compiled from Arabic social media, such as Facebook and Twitter, where approximately 6,138 post and tweets are collected. Three Arabic NLP tools are implemented, namely, Normalization, Stop-word removal, and Stemming. Five classification algorithms, SVM, NB, RF, K-NN, and DT are applied to extract the polarity from the datasets. The performance of the algorithms are calculated using Recall, Precision, and F1-measure standards. Several experiments are carried out on all datasets including the classification algorithm, as follows: with the application of all NLP tools; with none of the NLP tools; and separately to the Facebook posts and then to the Twitter tweets using NLP tools.
When applying the classification algorithms to all datasets with all NLP tools, the results show that the SVM algorithm gives the highest accuracy for the F1-measure, followed by NB, DT, K-NN, and RF. When applying the classification algorithms to all datasets without Stemming, the results show that the RF algorithm provides the highest accuracy for the F1-measure, followed by SVM, NB, DT and K-NN. When applying the classification algorithms to all datasets without Stop-Word Removal, the results show that the RF provides the highest accuracy for the F1-measure, followed by SVM, NB, K-NN, and DT.
When applying the classification algorithm to the Facebook and Twitter datasets with all NLP tools, the results show that the SVM algorithm provides the highest accuracy for the F1-Measure, followed by RF, NB, DT, and K-NN.
Funding Statement: The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.