Tourism Companies Assessment via Social Media Using Sentiment Analysis

: In recent years, social media has been increasing widely and obviously as a media for users expressing their emotions and feelings through thousands of posts and comments related to tourism companies. As a consequence, it became difficult for tourists to read all the comments to determine whether these opinions are positive or negative to assess the success of a tourism company. In this paper, a modest model is proposed to assess e-tourism companies using Iraqi dialect reviews collected from Facebook. The reviews are analyzed using text mining techniques for sentiment classification. The generated sentiment words are classified into positive, negative and neutral comments by utilizing Rough Set Theory, Naïve Bayes and K-Nearest Neighbor methods. After experimental results, it was determined that out of 71 tested Iraqi tourism companies, 28% from these companies have very good assessment, 26% from these companies have good assessment, 31% from these companies have medium assessment, 4% from these companies have acceptance assessment and 11% from these companies have bad assessment. These results helped the companies to improve their work and programs responding sufficiently and quickly to customer demands.


Introduction:
Nowadays, online websites are suffering from lag in marketing their products due to lack of effective systems that analyze and trace customer assessments to their services; so some companies remain unknown despite their good quality of services 1 . Social media such as Facebook, Instagram and Twitter have taken a substantial part of publics and actions due to the huge and rapid advances in information technology. Many users are using communication sites not only to discover a new associate and share materials, but also to show their moods through variety of ways such as an electronic posting wall of comments. As such, Arab countries counted a huge number of social media users and became an appropriate media to endorse freedom of speech therefore; this media had become a rich resource for text mining and sentiment analysis methods 2 . Text Mining is used to detect undiscovered information on social media to obtain useful information from different comments and reviews. Furthermore, web text mining uses data mining techniques to automatically extract and analyze information for knowledge discovery. Web data is typically unlabeled, distributed, heterogeneous, semi-structured, time varying, and high dimensional 3,4 . Facebook is one of the most important advertising and promotion platforms for products, businesses and among the most important of these businesses is the tourism, travel and promotion for companies in this sector. Almost all tourism companies have a page on Facebook, through which they publish advertisements, promotion and propaganda for their travels, tourist trips and the services provided by these companies 5 . Millions of people travel daily around the world for business, vacations, sightseeing and a large amount of money is spent on tickets, rooms, food, conveyance and enjoyment 6 .
Two types of information flow in tourism business, the first flow is from the company to tourists such as tickets and different types of reservations and the other flow is the aggregate information from the tourists to the company. These two flows of information can be provided and analyzed by a suitable text mining algorithms to explore meaningful and vital decisions for the tourism provider. The essential part of information collecting manner is finding the emotions and thinking of other users. With the growing popularity of online review sites, new challenges are arisen to use information extraction techniques to detect and understand the sentiments of others 7 . Ultimately, sentiment analysis strategy checks assessments of users; for example, an individual tourist in correspondence with a tourist supports agent. It observers negotiations and assesses discussion to evaluate tempers and emotional states through a scoring mechanism, especially those associated with a business works 8  In this paper, a sentiment analysis model for Tourist Company is suggested on Iraqi dialect Facebook posts to assess tourism and travel companies based on extracting sentiments from customer's comments on the social networking sites of these companies to discover useful decisions. The rest of this paper is organized as follows: In section 2, the related works on this field is subjected. In section 3, the general concepts of sentiment analysis are shown and processes are summed up. In section 4, the proposed model is presented. In section 5, the experiments on the suggested model are conducted. Last section is the conclusion of this work.

Related Works:
In what follows, some of the previous researches related to the techniques used for sentiment analysis are presented: 1-In (9), the authors incorporated sentiment analysis methods to analyze reviews from travel blogs. Naïve Bayes, SVM and N-gram character based are sentiment classification methods used for classifying reviews from travel blogs for 7 popular destinations in Europe and the US. Experimental results indicated that when a large number of reviews was analyzed all three methods approximately approached at least 80% of accuracies.

2-In (7), sentiment analysis is performed on
Arabic Facebook news pages. The proposed system consists of preprocessing, feature vector selections and classification methods using support vector machines (SVM), Naive Bayes (NB) and decision tree. 2400 comments have been collected and represented into a distinct record, and then grouped into 3 groups (supportive, attacking and neutral Comparative experiments are performed between two machine learning algorithms SVM and Naïve Bayes through a training model for sentiment classification. It was concluded that, using Naïve Bayes (NB), the accuracy reached high when the bigram feature was utilized. In contrast, the support vector machine (SVM) outperformed the Naïve Bayes (NB) when unigram feature is used. 4-In (10), in this paper, a comparative study is applied between using Support Vector Machine (SVM), Naïve Bayesian (NB) and Multilayer Perceptron Neural Network (MLP-NN) classification methods on Arabic data sets which are Aljazeera news web site Saudi Press, Agency (SPA) and Alhayat. The experimental results were applied on 1400 Arabic documents belonging to different categories yielded a precision of about 0.778, 0.754, and 0.717 for SVM, NB and MPL-NN using 600 input layers respectively. 5-In (11), this paper, an Arabic corpus is built consisting of Facebook (FB) posts written in Dialectal Arabic (DA) with no grammar. The collections are labeled with five labels (positive, negative, dual, neutral, and spam). The words specifying opinions are used in a lexicon-based classifier. 6-In (12), an extraction model is proposed on a set of reviews from the public page 'Opposing Views' on Facebook using the software QSR NVivo 11 which is used to analyze unstructured data. The auto code feature in QSR NVivo 11 was utilized to analyze and tag the comments to positive or negative sentiments. Different Techniques were used for preprocessing such as Tokenization, Stemmed words and query augmentation for synonyms. After analyzing a specified number of comments, positive sentiments were about 29.6% and negative sentiments about 62.0%.The experimental results showed that the percentage of negative reviews is twice more than positive reviews.

7-In (3), a conceptual model based on text mining
is proposed to classify texts into 4 emotional groups from the emotional dataset, they are: anger, fear, joy, and sadness. The Naïve Bayes optimized by particle swarm optimization (PSO) is utilized as a classifier. The preprocessing stages within model development were: document collection, transforming to lower cases, tokenization, token filter (by length), stop words removal, stemming and vector creation. The experimental results yielded accuracy about 65.93% using Naïve Bayes without optimization, and 66.54% with PSO which showed an increase in the accuracy. 8-In (5), presents a lexicon-based sentiment analysis for common Iraqi vernacular. Three machine learning methods (KNN, Naive Bayes and Rough Set Theory) are used to classify Iraqi sentiment on Facebook. A dictionary of Iraqi keywords is built to include single and double word entries for both positive and negative sentiment. Rough Set Theory gives the best classification ratio compared with the other two methods. The classification ratio of rough set theory is better than KNN and Naïve Bayes because the upper and lower boundary in rough set provides a good tool to overcome the conflict problem in sentiment analysis.
In Table 1, depicts the previous studies including the classifiers used, on what language the sentiment analysis is performed and their gaps. From these studies, it was identified that an Arabic language and Iraqi dialect based on sentiment analysis for tourism companies is not considered and it was proposed to develop the solution to the research problem.  13 . It is also known as opinion mining which is the process of determining the polarity of opinions or reviews written by humans to rate products or services. SA can be done on a document level where the entire text is assessed to determine the opinion polarity by extracting features. Also it can be done on sentence level where the text is partitioned into sentences to be evaluated separately to determine the document polarity. Text polarity can be positive, negative or neutral. In tourism, information provided through a comment is either subjective (opinionated) which are based personality feelings and decision making about events or objective (factual) which are based on information, evidences, and opinions. Sentiment analysis is utilized for evaluating written or vocalized languages to recognize whether speech is positive, negative or neutral and to what degree. Currently, there are many analysis tools able to deal with remarkable volumes of customer quibbles consistently 8 .
The need for Facebook has increased with the spread of COVID-19, forcing people to work from home without going to the workplace. From here, the strength of the research emerges through knowing the tendencies of the customers of the travel and tourism companies, and through these trends we can evaluate the company's. Tourist's comments and posts often reveal pleasure, prevention, displeasure, enjoyment. This sentiment information has a significant impact for tourism companies seeking to enhance customer managing and commercial productivity 14 . Facebook has particularly instigated long-term customer customization and rapid growth in terms of page size, comments and posts 5 .
Sentiment analysis includes a multi-step operations, they are: a) retrieval of data, b) extraction and selection the required data, c) preprocessing, d) feature extraction, e) subject detection, and f) applying data mining methods 6 . There are two fundamental methods for extracting sentiments. The first method is using lexical-based which involves manipulating a document polarity from the polarity of terms in the document. The second approach is using machine learning techniques which are classified into unsupervised and supervised methods. The supervised technique which is utilized in the proposed model involves classifier construction from labeled examples of texts through a supervised classification process. The advantage of machine learning techniques is creating trained models for trained contents to classify new data. The key of machine learning methods used for sentiment analysis are Support Vector Machine (SVM) and Naïve Bayes as they were usually designed for binary-class classification tasks [6][7][8][9][10][11][12][13][14][15] . Rough set theory plays a big role in text classification and categorization within different areas [16][17][18][19] therefore it is utilized in this paper as a classifier. Furthermore, Naïve Bayes Classifier (NBC) [20][21][22][23][24][25] is also utilized in this paper as a probabilistic classifier, which uses Bayes' theorem as a decision rule with independent features.

The Proposed System:
In this section, the fundamental stages of the proposed model are presented. Sentiment analysis of the texts included in the comments is the fundamental operation in the proposed model because the main objective of the proposed model is to assess tourism and travel companies based on extracting sentiments from customer's comments on the social networking sites of these companies to discover useful decisions. The following are the main stages of the proposed model as shown in Fig.  1.  426 process and removing stop-words process. Lite stemming has been used in the proposed model. Finally, depending on the sentiment analysis results which are either positive or negative, the assessment will be decided.

Results:
In this section, the proposed model is experimented with the collected information from Facebook. The dataset are collected from 71 Iraqi tourism companies in the Facebook, Table 2 shows the number of dataset details. As mentioned in the above table, there are 71 company's pages in the Facebook, 8 posts from each company have been selected, 25 comments from each post have been selected, and the total sentences from all comments are 33875. The sentiment analysis took into account the classical Arabic language and Iraqi dialect words.
Three classification methods have been implemented for 14200 comments; these methods are Rough Set Theory, Naïve Bayes and K-Nearest Neighbor. Figure 2 illustrates the accuracy ratio of classification methods for sentiment analysis for the above dataset.

Figure 2. Sentiment Analysis Accuracy
The best one is Rough Set Theory (RST), (Table 3) illustrates the confusion matrix of RST classification method. In more details, below Tables 4, 5 and 6 show the confusion matrices for the positive, negative and neutral sentiment of the comments respectively.   The accuracy equation for binary classifier is calculated as follows 13 : From above Eq. (1) and using data from Table 3, 4 and 5 it could be shown that the accuracy reached about 94.62% on positive comments, whereas reached about 94.42% and 98.9% on negative and neutral comments. Furthermore, the average accuracy 13 which is the average of perclass effectiveness value of the classifier is calculated using the following Eq. (2). The resulted average accuracy is about 95.98%.
Where l is total number of classes. The average Error Rate 13 which is the average of per-class classification error is computed using the following Eq. (3), which resulted about 3.9%.
The following equation Eq. (4) shows the precision computation 13  The assessment of tourism companies depend on the sentiment of customer's comments in the Facebook social network. So the following rules illustrate the assessment of Tourism Company depending on sentiment analysis results: Otherwise the Assessment is Bad By applying the above rules on 71 Iraqi tourism companies, 28% from these companies have very good assessment, 26% from these companies have good assessment, 31% from these companies have medium assessment, 4% from these companies have acceptance assessment and 11% from these companies have bad assessment. Fig. 3 illustrates these results.

Conclusion:
Sentiment analysis of Iraqi dialect reviews in Facebook has been proposed for the assessment of Iraqi tourism companies. Positive and negative Iraqi dialect sentiments are represented as single and double words (Bag-of-Words) to classify the sentiments. From the experiments, it was observed that the percentage results of our proposed model represent the real opinions of tourists in these tested companies. Utilizing standard Arabic language and Iraqi dialect sentiment analysis played a big role in our model. Rough set theory gave the best classification accuracy compared with Naïve Bayes and K-Nearest Neighbors, because this theory has shown a great benefit for categorized data. Also, it is concluded that instead of utilizing costly studies based on surveys, the sentiment analysis of customer opinions for tourisms made easier for companies to recognize their economical values and customers' opinions about their services, which in turns provides insight into enhanced decision making policies. The limitation of the proposed work was the absence to some semantic features; therefore it is better to consider these features as a future work.