Sentiment Analysis of Indonesian Citizen Tweets Using Support Vector Machine on the Rebranding of Twitter to X

publik


INTRODUCTION
Nearly all human activities revolve around opinions, which also heavily influence how we behave.Our opinions, understanding of reality, and the decisions we make are all influenced by how other people perceive and evaluate the world.As a result, when we need to make a decision, we often request for others' perspectives.This applies to both individuals and organizations [1].Increased competition among market participants typically drives consumer sentiment toward goods or services.However, knowing public sentiment requires expenses and effort that are difficult.Fortunately, nowadays businesses and individuals can look for answers to these questions online.What must be done is make an effort to compile all online opinions and transform them into an answer [2].In other words, businesses and individuals can conduct a sentiment analysis, which is also known as opinion mining.
These days for doing sentiment analysis, python programming language is commonly used cause of its flexibility doing some processing, building model, evaluating the machine using the specific library in python [3].In this research, library of pandas, numpy, nltk for doing text preprocess like tokenizing, stemming, deleting stop words, and also skicit-learn for making the model.
One phenomenon that has caught the attention of researchers is when Facebook rebranded itself Meta in October 2021.The decision was made at a time when Facebook is caught in controversy over claims that it has secretly tracked real-world harms made worse by its platforms.It disregarded the warnings of its employees regarding the dangers of their design choices, exposing vulnerable communities worldwide to a cocktail of dangerous content [3].The action unquestionably caught people's attention.Thus, sentiment analysis research was carried out when Facebook changed its name to Meta.[4], [5], [6], [7].Firstly, a study examined tweets in English shared with the word "metaverse" and were examined and classified as positive, negative, and neutral.The data pulled from two different date ranges will be examined.The first stage covers one week before Zuckerberg announced while the second stage covers one one week after.The research found that the tweets were generally positive before Zuckerberg's speech.After Zuckerberg's speech, positive tweet rates are decreasing and both neutral.In addition, negative tweet rates are increasing [5].
Similarly, another reserach studied Indonesian public sentiment regarding metaverse technology by classifying public sentiment into 2 opinions, namely positive opinions and negative opinions.Data collection was taken from Twitter using the keyword "Metaverse Indonesia" via the Twitter API which was processed by the Rstudio application with a data collection period of 5-13 January 2022.From a total of 7755 data that have gone through the cleansing process, it was found that public sentiment towards the development of metaverse in Indonesia was dominant in negative sentiment, namely 1377 sentiments compared to positive sentiment which was only 1251 sentiments [6].Strengthening the result above, more study was focused on the first 24 hours after the launch of Meta.The keyword determined to collect the data is 'Meta' as the input section on the tools section of 'All of this words', secondly in the 'Any of this word' section was inputted 'Facebook', the language of tweets is targeted from 'English' tweets with the amount of 10000 tweets.They concluded that the negative sentiment also resulted in a bigger amount of likes and retweets than the positives.From negative sentiment, researchers found that 10% of the topics discussed expressed the same concern about data privacy [7] Lastly, even though there was reserach proposed different results [4], this recent research also highlighted how the Facebook rebranding impact on the metaverse caused a backlash and a PR crisis.Tunca et al. collected data in the form of articles about the metaverse from The Guardian (theguardian.com)between 24-Apr-2019 and 25-Apr-2022.The sentiment analysis showed that the sentiment about the metaverse is predominantly positive.They argued that despite claims of monopoly, the metaverse continues to lead the industry in all categories in which Facebook competes.Thus, even though the use of the metaverse has increased by 12 times since 2020, Tunca et al. also noted that Meta has lost young users as a result of losing its status as the center of attraction for young people.
Referring to the above study, the writers are intrigued by the recent phenomenon, the rebranding of Twitter and its substitution of the iconic bird logo with an X.The tech billionaire, Elon Musk decided to rename the social media platform X.com on its website after purchasing Twitter last year.Research showed that the logo has become one of the crucial aspects of the company in establishing its brand value.It has been demonstrated that a logo with visual cues-such as color scheme, graphic icon, and font size-helps customers remember brands more effectively than a logo without images [8].Obviously, it is impossible to ignore the name and logo changes.For more than ten years, the blue and white bird logo that Twitter uses has become a symbol of the social network's distinctive culture and lexicon.What is more, "tweet" became a verb, and a "tweet" referred to a post [9].As a result, businesses that change their logos may encounter different reactions from their customers, just as the research cited above had concerning Facebook.
Unfortunately, this rebranding issue from Twitter to X has not been explored by researchers.Whereas, as of April 2023, there were 14.75 million Twitter users in Indonesia, placing it sixth in the world [10].Twitter has grown in popularity as a result of being a fast-paced information source [11].Also, Twitter data has been utilized for analytics purposes, such as sentiment analysis [6], [7] topic modeling, and information extraction [11].Twitter is also claimed essential for journalists.One can easily know what is currently happening and trending on Twitter.This information helps journalists to find out what he/she want or need to write [12].Therefore, due to the popularity of Twitter among Indonesian citizens, the writers are motivated to investigate how Indonesian citizens feel about Twitter's rebranding..A strong logo has the potential to stick in our minds and become recognizable to customers for unknown reasons.As emotions and experiences flood the design, the logo fights for a fragment of our memories.[13].To put it plainly, people will reject new logo once an established one has been built.The purpose of this research is to learn how Indonesian citizens perceive the recently introduced X logo.
The model that has been used in this research is Support Vector Machine (SVM) for doing sentiment analysis.The SVM algorithm is machine learning method which could divide data in more than two-dimensional data which is good for text or tweet data in this research that has lot of tokens creating several dimensions.The division can be represented as polarity for three class (positive, negative, and neutral) so this research can conclude how sentiment of Twitter doing rebranding its logo and name into X.

RESEARCH METHODS
To begin with, we do a literature study which includes reading some news, and examining both international and national journals and books as our initial references.We found this topic with sentiment analysis interesting, so we decided to bring the topic up with the Support Vector Machine (SVM) method.And machine learning (supervised learning) is being used in this research-based methodology.There are three phases in this methodology: preparation phase, model training, testing phase, and evaluation phase, then the result could be discussed and shown in the figure below.

Sentiment Analysis
Sentiment Analysis (SA) is a computational examination aimed at categorizing the opinions, attitudes, and emotions expressed by individuals toward a particular entity, which could be an individual, event, or topic.This categorization involves determining whether a given sentence within a specific document conveys a positive or negative sentiment [14].

Natural Language Processing
Natural Language Processing (NLP) is a computational approach to analyzing text.NLP aims to acquire an understanding of how humans comprehend and utilize language, to create tools and methods that enable computer systems to comprehend and manipulate natural languages for various purposes [15].NLP can be described as a collection of computational techniques driven by theoretical principles.These techniques are used to examine and represent naturally occurring texts typically at multiple levels of linguistic analysis to achieve language processing that closely resembles human capabilities [8].
NLP serves a wide range of tasks and applications.The term NLP is commonly employed to explain how software or hardware components within a computer system are involved in the analysis or generation of spoken or written language.The term "natural" is used to distinguish human communication, both spoken and written, from more formal languages like mathematical notations or programming languages, which tend to have limited vocabulary and syntax [14].
Language serves a purpose beyond the mere conveyance of information.It represents a collection of tools that facilitate the sharing of meanings, but it should not primarily be seen as a method for encoding meanings.The foundations of Natural Language Processing (NLP) are rooted in various fields, including computer and information sciences, linguistics, mathematics, electrical and electronic engineering, psychology, artificial intelligence, robotics, and more.NLP applications encompass a range of academic disciplines, including the processing and summarization of natural language text, machine translation, user interfaces, multilingual and cross-language information retrieval, speech recognition, artificial intelligence, expert systems, and so on [16].

Data Preprocessing
In conducting data mining, the data preprocessing stage is crucial.However, when it comes to preprocessing text data, it is still considered quite challenging [17].This is because text data is very raw and can have different meanings from what the author intended, and it may not adhere to grammar rules due to cultural shifts.
There are several types of data preprocessing for text, including removing conjunctions or connectors (stop words) such as "or"; "and"; "if"; and so on, normalizing each word to its base form, and eliminating unnecessary punctuation marks like "isn't".These data preprocessing treatments can alter the original sentence structure.Therefore, research is conducted to determine whether performing data preprocessing on text will affect the accuracy of the resulting data mining outcomes [18].

Support Vector Machine
A Support Vector Machine (SVM) is a supervised learning technique used to examine data, and identify patterns, and is applicable for both classification and regression analysis.SVM operates based on the principle of structural risk minimization (SRM) to discover the optimal hyperplane that distinguishes between two classes within the input space.The parameters = , and w, b are acquired through the minimization of the loss function [15].
(1) Somantri & Dairoh analyzed sentiments intending to gauge customer satisfaction levels related to eateries and culinary restaurants situated in the city of Tegal [19].This was achieved by sorting the data into positive and negative sentiments and subsequently relabeling them as "Good" and "Average".The chosen method for classification was the Support Vector Machine.Feature selection was carried out using both the Information Gain and chi-square techniques.The research process involved the manipulation of text as input for the employed model, which included steps like case transformation, tokenization, token filtering, the removal of stopwords, and the application of TF-IDF weighting.The dataset consisted of 80 text documents, which were distributed among 40 documents categorized as "Average" and 39 documents categorized as "Good".The most favorable outcomes were obtained by utilizing the Information Gain method for feature selection in the SVM model, resulting in an accuracy rate of 72.45%.
Another study focused on conducting sentiment analysis specifically related to the "GOJEK" brand.In this research, Twitter data associated with "GOJEK" was examined, and it was categorized into either positive or negative sentiments.The classification method employed was the Support Vector Machine (SVM), and feature extraction was carried out using the TF-IDF method.This study included various text preprocessing steps, such as emoticon conversion, data cleansing, stemming, and handling negations.The training phase utilized 1,000 positive tweets and 1,000 negative tweets, while testing involved 100 data points using the Support Vector Machine method.The outcome was an impressive accuracy rate of 86%, as determined using a confusion matrix [20].

Radial Basis Function (RBF) Kernel
Radial Basis Function (RBF) Kernel is a function for analyzing data that is not linearly separated and requires 2 types of parameters (gamma and cost).Parameter C (cost) functions to optimize the margin from the equation thereby reducing misclassification, while the parameter γ is a parameter that affects the complexity, shape, and influence of data on line constraints which can be seen in equation [21].
(2) Where, ||xi-xj|| = Euclidean distance y = Gamma parameter C = Cost parameter K-Fold Cross Validation K-fold cross-validation is a form of testing that assesses algorithm performance by randomly dividing data samples and then dividing the data by the K value of the total fold.Next, one of the K groups will be used as testing data and the remaining groups will be used as training data [22].

Evaluation and Validation (Confusion Matrix)
A confusion matrix provides comprehensive data regarding both predicted and actual values, and it serves as a tool for assessing the performance of classification outcomes.This matrix typically arranges predicted labels along the X-axis and true labels along the Y-axis [17].
The confusion matrix comprises four distinct categories, which are True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).In the context of sentiment analysis, a True Positive (TP) refers to a sentence with a positive sentiment correctly predicted as positive.Conversely, a False Positive (FP) indicates a sentence with a positive sentiment incorrectly predicted as negative.True Negative (TN) represents a sentence with negative sentiments accurately predicted as negative, while False Negative (FN) denotes a sentence with negative sentiment mistakenly predicted as positive [18].

Preparation Phase
In this phase, we do the data scrapping which collects the tweets about Twitter rebranding into X.The tweets are collected from July 23rd, 2023, when the rebranding news appeared, and July 31st, 2023, which focuses on the day of rebranding occurred.Then we do the data labeling which to put the sentiment label; 1 is for positive, 0 is for neutral, and -1 is for negative.After that, we do the data preprocessing which includes removing special characters and any tags, stop words, synonyms, lowercasing, stemming, and tokenization.So, the preparation phase is ended when we have clean tweet data in the data frame which is ready to be used in our model.

Model Training and Testing Phase
Feature extraction is the first step in this phase which finding weight in every word in every tweet using the Term Frequency -Inverse Document Frequency (TF-IDF) method.The result of TF-IDF then being used in the Support Vector Machine algorithm which clusters each data into classes / labels with Hyperplane.This research uses the Gaussian Radial Basis Function (RBF) kernel function which handles wide data spreading and unlimited feature dimension.The SVM method is trained to find which number of fold work best in this research using 10-Fold Cross Validation to get better F1-Score.After that, the best model is implemented for testing the data using the same method as training.

Evaluation Phase
For evaluating the model, we use the confusion matrix as the standard performance evaluation in supervised learning.We find true positive, true negative, false positive, and false negative from our results to calculate recall, precision, F1-Score, and accuracy.This score is the foundation of working on the best model to validate that the result appeared and could be used to shape people's sentiments toward Twitter rebranding into X.

Rebranding Analysis
Once the sentiment is analyzed and the keywords are scrapped, the writers will also study Indonesian citizen perception by using the rebranding theory that was proposed by Dominique-Ferreira & Roque [23].Rebranding is typical of significant changes, like adjustments to the internal structure of an organization or modifications to the external environment, which businesses cannot avoid.One of the main for a product to rebrand itself is so it can keep up with the trends and have a modern image and identity.Rebranding, however, is not always a wise move, especially if the product has been on the market for some time.A rebranding may occasionally be necessary because the digital world is constantly evolving and forcing brands to adapt and reinvent themselves.However, it is advised that the changes in communication and advertising only be made after the entire rebranding process is complete, and then they be introduced gradually [23].For instance, Gojek still uses the green and black color scheme despite changing their logo in 2019.As a result, it is easy to understand the Gojek logo since the customers still recall the old logo.The new Gojek logo, however, is still viewed as being very sophisticated and modern by the target market, who are regular consumers [24].Nowadays, logos are crucial even after a brand has become common knowledge.A logo is a symbol that instantly and effectively helps customers recognize a company, its products, and the services it offers [25].Rebranding does not always involve changing the brand's visual identity; as a result, changing the logo is just one of the many ways to rebrand.From a large-scale perspective, a variety of external factors, which can be distilled into three categories: shifts in consumer demand, shifts in economic market conditions, and changes in competition models, drive businesses to alter their logos [26].

Findings
Firstly, in this research, the writers did a data scrapping which collects tweets about Twitter rebranding into X.The table below shows the collected data from July 23rd, 2023, when the rebranding news appeared, and July 31st, 2023, which focuses on the day of rebranding occurred.From the table above, 272 tweets were gathered on the 23 rd of July 2023.Then, 350 tweets were gathered on 31 st July once Twitter had changed into X.Table 1 is the example of the detailed tweet data that is scrapped in this research.In this research tweets that have been scrapped are still raw which consist of tag, link, and username accounts that are mentioned if it is on replies.And its capitalization still has random input based on the user's typing preferences.All those raw data have been parsed document by document.Then, the parsed data is cleaned using normalization that deletes the link, tag like <br>, hashtags, mentions, and punctuations like an exclamation mark or question mark, and case folding which handles capitalization of each word.Table 2 below shows the result of the preprocessed data which is the result of doing removing emoticon, web tags, and case folding into all lowercase.

Table 2. Result of Normalization and Case Folding Data
No. Username Tweet 1.
@BaxterJunus hahhaha kena rebranding sdh kali ini brand besar dan yang sudah melekat udeh bener bagus bagus logo burung biru ini diganti jadi kayak logo pn … … … 269 @JerrynxChndr twitter udh gw pk tahunan, skrg mw di rebranding jd malas pknya lg dh 270 @alexasisii kok ga pada heboh twitter mau rebranding gegara … The Eliminate Stop Words process then occurred to delete words in the stop words list in the document for each tweet.In this research, the stop words list includes; and, but, then, also, RT, and so on.Then the Stemming treatment on the document for each tweet means cutting words that have affixes to their basic words such as "employment" becomes "employ", "increasing" becomes "increase".Then, Tokenization is performed in this research where the data for each tweet will be broken down into each word which will produce array of tokens for each data.Some data results have changed in Table 3.
The previously mentioned sentiment analysis was obtained with an accuracy of 90.827% through the use of the Support Vector Machine program.Figure 1 shows the neutral 87.0% sentiment of Indonesian netizens at the time of Twitter's initial announcement of its rebranding plan.The outcome is nearly identical to Figure 2, which represents the sentiment of Indonesian netizens at 87.8% after the rebranding has been officially completed.In second place, there are only 8.1% positive sentiments and 4.8% negative sentiments in Figure 1.Similarly, figure 2 displays only 7.8% positive sentiment and 4.4% negative sentiment, indicating similar results.It is interesting to note that the results above are consistent with a report from Marketing Chart [27], which indicates that the majority of US consumers are not bothered about Twitter's rebranding to X. Figure 3 below illustrates this.
Thirdly, the author also wants to draw attention to the outcome that the word cloud displays.A word cloud is an illustrative depiction of text that is created by calculating how frequently each word appears in the data set.The frequency increases with the word or word-phrase's size in the word cloud.In this research, the writers apply clusterization, which involves developing a word cloud analysis alongside sentiment analysis.The greater the size of the word or word phrase in the word cloud the higher the frequency.Therefore, we identify the most  relevant topics that generate positive, negative, or neutral sentiments from Indonesian netizens who discuss or engage with the content related to Twitter's rebranding as X.
For this purpose, we have used in our phyton code several existing Python libraries such as pandas, numpy, NLTK for preprocess text data, scikit for making SVM model, and evaluation metrics to evaluate the result of the model, and also word cloud to make a representation of word cloud for text.We configure the word clouds to show the 15 most frequent words and word phrases for the three clusters of positive, negative, and neutral sentiments tweets, obtaining 3 different word clouds.The results from Figure 4 and the word clouds of neutral sentiment.From Figure 4, the fifteen most frequent words that can be seen are Twitter, rebranding, x, jadi, logo, Elon, di, ini, musk, menjadi, ada, dari, mau, itu and nama.The writers learned that the widespread media coverage of branding had made these five terms the most frequently used words.For this reason, the word cloud in Figure 4 above does not include adjective terms that depict the opinions of netizens.In addition, the writers found out that from the positive sentiments we have got, the fifteen most frequent words that appear are: twitter, rebranding, jadi, elon, ada, x, ga, ini, logo, main, mau, akan, bisa, cuma, and di as shown in Figure 6.Lastly, Figure 6 shows the word cloud for the negative sentiment from the 23 rd and 31 st July 2023.The five most frequent words that appear are: Twitter, rebranding, jadi, mau, bisa, dia, gue, ini, bikin, gak, ganti, jelek, karena, kalau, logonya.It is interesting to note that in the word cloud for negative sentiments, the terms "logo" and "jelek" emerged as the most often mentioned by Indonesian netizens.

TF-IDF
In this research, vectorization so that a set of texts can be calculated using TF-IDF which is carried out in the NLP process to produce sentiment analysis.Term Frequency -Inverse Document Frequency is a value weighting vectorization technique for text documents which will take into account that the words can be said to extract good, bad or no influence at all.Table 4 below is the TF-IDF results from the tweet data from top 5 lexicon tokens.For the evaluation metrics on the "Positive" dataset, the Precision value is 0.87, 0.82 for the Recall value, and 0.92 for the F-Measure value.The model for the "Negative" policy dataset obtained a Precision value of 0.86, 0.82 for the Recall value, and 0.85 for the F-Measure value.Meanwhile, for the "Neutral" policy dataset, 0.83 was obtained for the Precision value, 0.75 for the Recall value, and 0.83 for the F-Measure value.Overall accuracy score 0.875, recall 0.9, and precision 0.86, which is good for representing the positive negative and neutral polarity for this dataset.Judging from the results of the data evaluation, this sentiment analysis in this SVM model is good enough to be applied.

Disucssion
Drawing from the results, this study concludes that the majority of Indonesian internet users have a neutral sentiment regarding Twitter's rebranding to X, with 87% of respondents responding on July 23 and 87% responding on July 31, 2023.
The result aligns with the previous research about the sentiments of netizens towards the rebranding of Facebook to Meta which stated that users are also more interactive towards neutral sentiment [7].In addition to that, the small percentage difference between the two dates suggests that Indonesian internet users are not particularly concerned about Twitter's rebranding.This result lines up with a survey conducted among over 11,000 consumers in the US, UK, and Australia, which revealed that the change has not been met with universally positive feedback as the majority (39%) said it made no difference to them [27].The non-positive feedback particularly from Indonesian citizens, could be the consequence of Twitter's rebranding decision, which Weinberg claimed is viewed as a collective shrug [28].All of those word clouds mean that each word that appears in the word cloud of each polarity matters so much to determine it is considered as positive, negative, or neutral.And the bigger words are, the most frequent the word is used in the tweet to describe its sentiment.For example, there are Twitter, rebranding, x, jadi, logo, Elon, di, ini, musk, menjadi, ada, dari, mau, itu and nama for neutral means that those words are frequently used in the tweet that sounds neutral not judging positive or negative sentiment complaining rebranding Twitter into X.And all those words have been gotten from the token that has been preprocess from the tweet dataset that has been scrapped form the web.
One issue that can be highlighted from the finding above is that the terms "logo" and "jelek" were the most frequently mentioned terms for negative sentiment in the word cloud created by Indonesian netizens.Words are presumed to take on emotional meanings.As a result, the term "bad" or "jelek" comes to be associated with a variety of unfavorable [29].A well-designed logo could become embedded in our minds, making it memorable to consumers for unknown reasons.The logo fights for a piece of our memories as the design fills with feelings and experiences.This pattern of neuronal activity may block other designs from sticking in our minds by preventing them from leaving the same lasting impression.Put simply, once a brand has been established, we will reject the new ones [13].Thus, a logo plays an important elements in branding and rebranding processes [24].
According to Dr. Wright [30] -a senior professional lecturer of marketing at American University's Kogod School of Business, Dr. Wright stated that for one, Twitter has a very loyal group of users, and generally speaking, it is hard to change the minds of consumers on anything related to the brand, including the name and logo.Therefore, rebranding is usually most effective when a business enters a new market or eventually turns to a new line of products.Unfortunately, despite the rebranding attempt, most Twitter users' experiences have not changed all that much.Now, X looks a lot like Twitter, and users and consumers in general have no reason to think otherwise, even though it might offer some additional features.Consequently, most users might not adopt the name X, as shown in Figures 7 and 8 below.
Both Figures 9 and 10 were screenshotted by the writers on the 24 th of January 2024 on our mobile phones.From Figure 9, it can be seen the local Indonesian media TirtoID still used the words LIVE Tweet and Twitter as their official poster and post.In addition, an account under the name of Logos ID and another personal X user still used the word ngetweet instead of nge-X.

CONCLUSION
Rebranding is a challenging situation in targeting desired positive public opinion results.272 tweets were gathered on the 23rd of July 2023.Then, 350 tweets were gathered on 31st July once Twitter had changed into X which has been preprocess to do sentiment analysis using SVM and got overall accuracy score 0.875, recall 0.9, and precision 0.86, which is good for representing the positive negative and neutral polarity for this dataset.This research concluded that the responses from Indonesian Twitter users on the 25 th and the 31 st of July 2023 show neutral sentiments.This result is also strengthened by the word cloud results which mostly show general words such as rebranding, twitter, https, jadi, yang, and X.However, the writers also need to highlight that this research only aimed to describe the sentiment of Indonesian Twitter users on the day when the rebranding was announced and, on the day, when it was officially changed.

Figure 2 .Figure 3 .
Figure 2. The sentiment analysis of Indonesian netizen on the 23rd July 2023

Figure 4 .
Figure 4.The US consumer view of Twitter's rebranding to X (2023)

Figure 5 .
Figure 5. Word cloud for neutral sentiment from 23rd and 31st July 2023

Figure 6 .
Figure 6.Word cloud for positive sentiment from 23rd and 31st July 2023

Figure 7 .
Figure 7. Word cloud for negative sentiment from 23rd and 31st July 2023

Figure 9 .
Figure 9.A screenshot of a post from Tirtoid Figure 10.A screenshot of a post from random tweet

Table 4 .
TF-IDF ResultEvaluationThe Data Validation method used in this research is to use a confusion matrix to look for evaluation metrics including Precision, Recall and F-Measure values, as well as the accuracy of the model with the dataset that has been prepared.With True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) data obtained from a model that applies sentiment to several Twitter datasets by implementing a training set and testing set for each policy , obtained data results to show Precision, Recall, and F-Measure which are depicted in the graph in Figure8below.