Aspect-Based Sentiment Analysis of Hotels in Bali on Tripadvisor Using BERT Algorithm

ABSTRACT


INTRODUCTION
The covid pandemic, which began in 2020, has brought many changes to human activities around the world, and Indonesia is no exception. People are required to change their lifestyle to survive the covid pandemic by always maintaining cleanliness, wearing masks, always washing hands using hand sanitizers and the like. Self-awareness about environmental hygiene is needed as a first step from the community. Crowded places have a high risk of spreading covid, one of the places in Indonesia that is known by foreign and local residents for its thick natural and cultural tourism, namely Bali [1]. Lodging is the main thing in tourism, of course, it has a high risk of covid transmission because all hospitality activities require direct contact starting from the check in to check out process. With the appeal of Indonesian President, Joko Widodo [2], there are many new policies that require companies to change the way business works remotely, so hotels in Bali are required to design new business processes by following existing policies.
Changes to standard operating procedures (SOPs) will inevitably affect the performance of the business process. In order to maintain Bali's status as the most popular tourist destination in Indonesia, the government must conduct an evaluation of hotel performance in order to maintain control over these alterations. Several methods, such as interviews with relevant sources, can be used to assess hotel performance. However, in the current Covid pandemic, assessment can be conducted with a data mining approach using sentiment analysis based on data obtained from website or social media sources about customer impressions after using hotel services in Bali [3].
The use of sentiment analysis methods alone is insufficient to define hotel aspects; therefore, Aspect Based Sentiment Analysis (ABSA) is used to obtain more specific results. The selected aspects refer to the aspects that the hotel provides to its customers in creating its own competitiveness, broadly speaking these aspects such as hotel safety, service, comfort, attitude of hotel staff and cleanliness of the hotel environment. The results obtained from this research are in the form of information that can be used by other parties such as the government or companies to address the improvement of the quality of business that runs after the company is transformed. Research that has been done by [4] that measures customer satisfaction in terms of the hotel's service, comfort, cleanliness, and location, which is beneficial for maintaining company stability. ABSA requires an optimal text processing and analysis algorithm. According to research [4], algorithms with Bidirectional Encoder Representations froms Transformers (BERT) architecture have the highest accuracy value compared to other architectural models such as Sent WordNet, Logistic Regression, and Long Short-Term Memory; therefore, BERT is ideally suited for use in this study's ABSA architecture. The necessary data for this research are customer reviews of Bali hotel services. Figure 1 from Statista.com demonstrates that Tripadvisor is a rich source of data due to the website's adoption of Web 2.0, which enables users to contribute to the creation of content, content, and other information, as evidenced by the number of reviews published each year, which rose between 2014 and 2020.
After implementing the new health protocol and collecting data from the Tripadvisor website, the government must evaluate hotel performance, which is segmented based on service, comfort, safety, and sanitation, using the sentiment analysis method. This investigation has yielded an evaluation of every aspect of the hotel.

RESEARCH METHOD 2.1. Aspect-based Sentiment Analysis (ABSA)
Aspect Based Sentiment Analysis (ABSA) is an aspect-based sentiment analysis method. The purpose of aspect-based sentiment analysis is to detect terms in the text that may appear explicitly so that each word has a different meaning when combined with other words. For  example, the following text "phone is out of date but the camera is still good" from these words there are two aspects, namely "camera" which means good and "hp" which means old. The method used by machine learning in analyzing sentiment needs to repeat terms and language structures (grammar) to recognize these two meanings. The purpose of grouping aspects as above is to produce recommendations from the customer's point of view [5]. Neural network architecture is very popular in making ABSA like the example above, because of its capability in encoding and decoding in the process of making architecture and can use vector dimensions in data processing [6].

Neural Networks
Neural Networks (NN) is one of the Machine Learning (ML) techniques that employs the deep learning architecture. The representation of NN is a human-like neural network composed of layered, interconnected nodes. The operation of NN is identical to that of the human brain. In NN, there are three primary layers: the input layer, the hidden layer, and the output layer, and each neuron or node is connected to those in the next layer, as shown in Figure 2. Data enters through the input layer and is forwarded to the hidden layer, where the hidden layer is the defined number, which then enters the next hidden layer and ultimately the output layer as a result of the NN process [7]. Each NN process can be repeated or referred to as a Recurrent Neural Network (RNN) to generate an optimal value, which gives NN an advantage over other ML techniques. RNN is a simple NN that loops each output back into the concealed layer. The cycle is repeated for the number of epochs determined [8].

Bidirectional Encoder Representation from Transformer (BERT)
BERT or stands for Bidirectional Encoder Representation from Transformer is a deep learning architecture for natural language processing (NLP) that was published in 2018. BERT uses the transformer technique to analyze the contextual relationship between one word and another in a sentence [9], [10]. Transformer uses data obtained from the self-attention mechanism; this mechanism changes the meaning of each word associated with the word that will be processed next. Transformer has two mechanisms called Encoder and Decoder as shown in Figure 3. The following is an explanation of each mechanism: a. Encoder Encoder serves to read the entire text input. Encoder is a stack of six layers. Each layer has a self-attention layer and a feed-forward neural network layer. Both layers help each node to focus on the overall semantic context of the word.
b. Decoder Functioning to generate a sequence of prediction outputs, the Decoder is a stack containing the exact same six layers as the Encoder, but in each Decoder layer an attention-layer is added that helps nodes retrieve key content from other nodes. This process is done in parallel so that there will be more than one attention layer that is processed and produces one output in the encoder process, this is referred to as multi-head attention [11].  Figure 3 shows the encoder and decoder process performed by the transformer in parsing text data. Each text input per word entered into the encoder is converted into a list vector using embeddings. Positional encoding is added to each word to indicate the position of each word. The input vector then passes through two layers, namely the self-attention layer and the feed-forward neural network. In the self-attention layer, three vectors will be created namely Query, Key, and Value. The self-attention value will be divided by 8 because the square root of the vector dimension is 64. The value is calculated from softmax so that the vector value is multiplied by the value of softmax as in Figure 4. Finally, the vector value is summed up to be the output of the selfattention layer, which is then continued into the feed forward neural network layer to input the next word. Every Encoder process is completed the key and value values enter the Decoder, all processes in the self-attention and feed-forward neural network are processed in the add and normalize layer. The process that occurs in Decoder, namely in the add and normalize layer, helps analyze relevant words. The output of each Encoder step will enter the Decoder and produce a vector value [12]. During the training process BERT will convert the word into a special token namely CLS, SEP and id token before proceeding to the Transformer process to give the position of each number [13].

RESEARCH METHODOLOGY
The research begins by identifying problems regarding the need for hotel performance assessment which then aligns the objectives and solutions to be made. The next stage is data processing which contains a small stage in the form of data collection from the Tripadvisor website source using the webscrapping technique which is then labeled according to the selected aspects. The data is then cleaned at the data preprocessing stage, then the data is sampled using random oversampling and separated into two parts, namely 90% for training and 10% for testing. The next stage is the creation of the BERT architecture using the previous data and continued with the evaluation of the architecture using the calculation of accuracy, precision, recall, f1-score, confusion matrix, and ROC curve. The next stage implements the architecture that has been made on all data and analyzes the results of the architecture prediction. All of the above stages are illustrated in Figure 5. The data used in the research was obtained from the Tripadvisor website. Data were selected from five hotels that have implemented health protocols; the hotels are sorted based on the number of customer reviews. Data retrieval is carried out with the help of the Prowebscrapper tool [14]. The flow of data retrieval is described in Figure 6. The first stage is identifying data needs and determining the attributes to be taken, then configuring Prowebscrapper according to the data needed, then running the previously created configuration to start data collection, and finally downloading the data that has been stored on Prowebscrapper. The results of the data obtained are shown in Table 1. The total amount of data retrieved was 3,419, and the selected hotels were Double-Six Luxury Hotel Seminyak, Le Meriden Bali Jimbaran, Mason Elephant Lodge, Padma Resort Ubud and Viceroy Bali. The five hotels were selected based on user ratings and the number of reviews given by hotel customers on the Tripadvisor.
The data that has been collected is then selected only the required attributes, namely text, and given additional labels in the form of sentiment labels (reviews) which contain positive and negative values and the second label is the safety aspect, cleanliness aspect, comfort aspect, and service aspect where the value 1 indicates that the text is indicated by a certain category, while the value 0 is the opposite does not indicate a certain category. After adding the labels, the data will be in the form of Table 2. The data that has been labeled is then resampled due to the unbalanced data shown in Table  3. This table shows the number of labels for each aspect used, in this table there are differences in numbers which result in imbalanced data. The imbalanced data can produce an architecture that is biased towards the most classes because the few classes are considered as noise data, it can be likened to the architecture will be more able to predict the value that has more classes, but it will be difficult to categorize the fewer classes due to the mislead of the prediction architecture [15]. Therefore, sampling is needed to overcome the imbalance of data in this study, one of the sampling methods suitable for use is random oversampling, this technique equalizes the data by increasing the number of the lowest class by duplicating data from the lowest class until the number is equal to the highest class. The result of oversampling will change the data into more. This technique is used in all aspects of hotels and reviews so that the total data used in this study is as shown in Table 3. Before the data is processed, it must be set aside first through the data preprocessing stage, in this research the preprocessing stage is described in Figure 7. Text cleaning on stopwords using the NLTK 1 library that is already available for English. Data that has been sorted and sampled is then cleaned at the text preprocessing stage before entering the training process. The data that is ready to be processed beforehand is then divided into two parts, namely training data which is used to create an architecture and testing data which serves as a validation of predictions by the architecture that has been made. Training data is taken 90% and testing data is taken 10% of the total data. This is followed by converting words into lowercase letters and changing English abbreviations such as "don't" to "do not" and other words that use the abbreviation "not" followed by the removal of tags such as "@" and the like, as well as punctuation or special characters. Followed by stopwords removal process that often appear and white spaces that come from the previous stage. The results of text cleaning can be seen in Table 4. The text that has been cleaned is then given a token in the form of SEP, CLS, and token id in each text.

Before
After I couldn't imagine a more beautiful place to stay. The staff were amazingly friendly. Could not fault the facilities and service. We didn't want to leave! We will be back for sure and won't stay anywhere else! imagine beautiful place stay staff amazingly friendly could fault facilities service want leave back sure stay anywhere else The BERT architecture is made by putting the cleaned data into the data loader, which splits the data into several groups. The training process comes next, and the architecture is then evaluated (as shown in Figure 8). The bert-base-uncease algorithm will be used to build the design, which will have 12 layers, 768 hidden layers, 12 heads, and 110M parameters. BERT utilizes the idea of "fine-tuning," which involves a little bit of extra setup in the form of an "optimizer" to handle processing of parameters during the training process. The batch size is 16, and the process is repeated twice (Epoch). Table 5 shows the BERT configuration used in this research. In this study, there are five kinds of labels used, so the results of the architecture created are also five, namely the review architecture (sentiment), cleanliness architecture, service architecture, comfort architecture, and safety architecture whose entire architecture uses the configuration and the process is repeated as much as the number of architectures created. The training process produces training values lost (train loss), validation values lost in the process (val loss), the resulting accuracy (val accuracy), and the time required (elapsed) on each amount of data entered into the process (batch) and repetition done recursively (epoch) as in Table  6. Each architecture that has been developed previously is evaluated using three methods: confusion matrix, classification report, and ROC curve [16]. The first method used is the confusion matrix where the confusion matrix is shaped like Table 7. It is assumed that if the data is correct and predicted to be correct, it will be classified as True Positive (TP), if the actual amount of data is incorrect but the predicted value is correct, it will be classified as False Positive (FP), if the actual data is correct but the prediction is incorrect, it will be classified as False Negative (FN), and if the actual value is incorrect and predicted to be incorrect, it will be classified as True Negative (TN) [17], [18].
The next evaluation uses the classification report which includes the calculation of precision, recall, f1-score, and accuracy to measure the accuracy of the architecture [13]. Each of these calculations is obtained through the following formula:

RESULTS AND DISCUSSION
For example, in the comfort aspect, the TP value is 335, and the TN value is 303, while both FP and FN values are 0. Based on the results of the confusion matrix on the five architectures created, the architecture can predict easily so that the TP and TN values are higher than FP and FN so that the accuracy value obtained is high around 90% and above on all architectural aspects. The results of the ROC curve are depicted in Figure 10 where if the blue line of AUC score is close to the value of 1 on the TPR axis then the algorithm has a ratio of correct predictions more than the FPR or wrong predictions. The average of the five architectures created has an AUC value close to the value of 1 which forms a blue line on the TPR as shown in Figure 9 which is close to the value of 1 on the TPR axis. From all evaluation methods used, all architectures that have been created can predict correctly with an average accuracy of 97% and above, so that they can continue to be implemented on all data. The architectures that have already been made is then used on all of the data to predict each owned aspect. This is done in the same way that the architecture is cleaned by preprocessing text before processing, which is then put into the dataloader, and then the algorithm is used on the text. The outputs of all the data are shown in the following figure: The results of this study which started from defining the problem, designing the architecture, and implementing the architecture made on data totaling 3,419 produced graphs Figure 10. The four figures show the number of positive and negative review relationships on hotel aspects. At a glance at all the results of the numbers and graphs, the average hotel in Bali has more positive impressions on each aspect than the negative reviews. In Figure 10.a, the total number of predicted review labels on all data is displayed. Based on this figure there are five hotels with the number of positive reviews in orange and negative in maroon, from these results all hotels have more positive impressions than negative impressions, and the hotel that has the most positive impression is Le Meriden Bali Jimbaran. The total amount of data labelled as service aspects is shown in Figure 10.b. Based on this figure shows hotel services that have positive and negative impressions, from these results all hotels have more positive impressions than negative impressions on the services provided covering everything from customer interactions with hotel employees, room booking services, and so on. Figure 10.c illustrates the total amount of data labeled as comfort aspects. Based on this figure shows the number of positive and negative impressions of hotel comfort, the average of the five hotels tends to have a positive impression of both hotel rooms, cafes, swimming pools and others that include customer comfort in staying at the hotel.
The amount of data labeled with safety aspects is shown in Figure 10.d. Based on the figure shows a graph of the number of positive and negative impressions of the safety provided by the hotel including privacy, confidentiality, safety officers, and the like related to hotel safety. Of the five hotels, there are not too many texts that indicate safety, it can be seen that the number is around 0 to 35. Of the five hotels, Viceroy Bali hotel has safety that seems good among other hotels.
The number of positive and negative reviews on the aspect of hotel cleanliness is shown in Figure 10.e. Hotel cleanliness is assessed by how clean the hotel facilities are such as swimming pools, rooms, restaurants and others. From these results, the average of the five hotels has a more positive impression than its negative impression. In terms of comfort aspects, the amount of data available is around 0-800 and the positive impressions have more than negative in the five hotels. In the service perspective, the amount of data available is 0-700 which shows that the five hotels in Bali have satisfactory service. In the aspect of safety, the amount of data available is 0-35, in terms of data for safety aspects less than other aspects, but in this data positive reviews remain more than negative. In the aspect of cleanliness, the amount of data available is around 0-200, where the five hotels are kept clean as evidenced in the graph which shows more positive reviews than negative ones.
Based on the results of the implementation of this research, it can be concluded that the use of BERT architecture produces good accuracy values, the average value produced is more than 90% in each aspect used by using a batch size configuration of 16, Adam optimizer, and epoch 2.
From the prediction data of 570, it can correctly predict 562 data. The results of the precision, recall, f1-score, and accuracy values of the five models created have an average value of 98%. The results of the implementation on all data, namely the Double-Six Luxury Hotel Seminyak, Le Meriden Bali Jimbaran, Mason Elephant Lodge, Padma Resort Ubud, and Viceroy Bali hotels, on average, have more positive impressions than negative reviews after implementing the new SOP calculated from various aspects ranging from cleanliness, comfort, service, and safety provided by hotel companies in Bali. It can be concluded that the average hotel in Bali has good performance in terms of aspects.
The data used is recommended to be balanced in terms of both aspects and the amount of data from the selected hotels, the data used can be taken from other sources so that the data is more varied not only sourced from the Tripadvisor website. The aspects studied in the next research can be more reproduced so that they can assess the hotel from a wider variety of perspectives, both in terms of customers, employees, and so on.