Sentiment Analysis of User Reviews on Cryptocurrency Application: Evaluating the Impact of Dataset Split Scenarios Using Multinomial Naive Bayes

Abstract


A. Introduction
The growth in the number of cryptocurrency investors has increased significantly, indicating a general condition that supports the growth of this industry.The Commodity Futures Trading Regulatory Agency (Bappebti) reported that the total number of crypto investors in Indonesia reached 18.83 million as of January 2024, a sharp increase from 16.86 million in January 2023.This growth is driven by the positive trend in the cryptocurrency market and the acceptance of cryptocurrencies such as Bitcoin for trading on stock exchanges in many countries [1].Additionally, this growth is supported by the development of investment service applications that facilitate users in transacting crypto assets.In Indonesia, Indodax and Tokocrypto have emerged as the main platforms for cryptocurrency trading.Indodax, founded in 2014, is one of the largest cryptocurrency exchanges in Indonesia.Tokocrypto, launched in 2017, has gained significant attention since being acquired by Binance, one of the largest cryptocurrency exchanges in the world.
According to a report from CoinGecko [2], Indodax holds a market share of 42%, while Tokocrypto holds a market share of 43%.Both applications provide various features to facilitate cryptocurrency trading, including the buying, selling, and storing of digital assets.Despite the growing popularity of cryptocurrency investment applications, several issues persist and may continue to develop, such as slow transaction speeds on the applications, which can cause losses and inconvenience for users due to the highly volatile nature of cryptocurrency prices.Additionally, risks of hacking or loss of access can lead to bankruptcy for cryptocurrency investment applications [3].These conditions can cause uncertainty, anxiety, and doubt among users of cryptocurrency investment applications, which can negatively impact the image of the cryptocurrency industry and reduce public trust and interest in cryptocurrency investment [4].
Therefore, sentiment analysis of these platforms is necessary.User sentiment towards these platforms plays a crucial role in their success and market acceptance [5].With the increasing number of users and transactions, it is important to understand how users respond to and interact with these platforms.Sentiment analysis offers an effective way to capture and analyze user feelings and opinions broadly and systematically [6].In this context, user reviews of the applications become a valuable source of information for understanding how users respond to and interact with these platforms [7].Moreover, since the sentiments are in Indonesian, this study also applies Indonesian text mining, which has its own characteristics [8].
This study aims to conduct sentiment analysis on user reviews of Indodax and Tokocrypto using the Multinomial Naive Bayes method.This method was chosen for its good performance in text classification [9] [10].Additionally, this study explores the impact of various dataset splitting scenarios and random states on the model's performance.
Thus, this research is expected to make a significant contribution to understanding user perceptions of the Indodax and Tokocrypto cryptocurrency investment applications.The results of this study are also expected to be used by application developers to improve their service quality and to provide information for potential investors to make better decisions in choosing a cryptocurrency investment applications based on sentiment analysis results.

B. Research Method
This research uses sentiment analysis to examine user reviews of two cryptocurrency platforms, Indodax and Tokocrypto.Data were collected using web scraping methods to gather reviews from the Indodax and Tokocrypto applications on the Google Play Store.The research stages are outlined as follows and depicted in Figure 1.

Data Collection
Reviews were scraped from the Google Play Store using the google-playscraper library on January 31, 2024.Web scraping is a technique for automatically collecting data from websites using a program or bot.A total of 73,060 reviews were collected for the Indodax application and 33,440 reviews for the Tokocrypto application.After filtering to focus on reviews from the period January 1, 2022, to January 31, 2024, the dataset was reduced to 36,205 reviews for Indodax and 13,915 reviews for Tokocrypto.

Data Preprocessing
The collected reviews were preprocessed to remove noise and irrelevant information.The text preprocessing process consist of several steps as follows: a. Case Folding: This step aims to convert all review text to lowercase.Case folding is performed using the lower() function in Python.b.Cleansing: This step aims to clean and remove emojis, symbols, URL links, @username, hashtags (#), punctuation marks, and non-alphabetic characters.c.Tokenization: This step involves splitting the sentences in the text into individual words or tokens.Tokenization helps in identifying individual words in the text.d.Normalization: This step involves changing and correcting abbreviated words to their normal forms so that words with the same meaning are consistently represented.In this study, the researcher used the slang dictionary available at https://github.com/nasalsabila/kamus-alayand added several custom dictionaries collected from the reviews.e. Stopword Removal: This step aims to remove words that do not have significant meaning for classification, such as "dan", "atau", "tetapi", etc.The researcher used several stopword dictionaries, including the NLTK stopword library, the Sastrawi stopword library, and custom stopwords collected from the reviews.
f. Stemming: This step aims to convert words with affixes in Indonesian into their root words.Stemming was performed using the Sastrawi library.

Data Labelling
In this stage, automatic data labeling was performed using the lexicon-based method.This method was chosen for its superior accuracy, precision, and recall compared to non-lexicon-based analysis [11].The SentiStrengthID Indonesian language lexicon available at https://github.com/masdevid/sentistrength_id was used for this purpose.The Valence Aware Dictionary and Sentiment Reasoner (VADER) module was utilized to assign sentiment labels (positive, negative, and neutral) by modifying the lexicon list.
The determination of labels for the reviews was based on the calculation of the compound score.Reviews with a compound score greater than 0 were labeled as Positive, those with a compound score less than 0 were labeled as Negative, and reviews with a compound score of 0 were labeled as Neutral.

Text Vectorization
The word weighting process using Term Frequency-Inverse Document Frequency (TF-IDF) aims to transform the text into a numerical representation that reflects the importance of words in a document relative to the entire corpus.This technique helps capture more meaningful information from the text, which is then used as input for the Multinomial Naïve Bayes model [12].Figure 2 shown the results of word weighting using TF-IDF on several data samples.

Dataset Spliting
The dataset was split into training and testing sets using three different scenarios: 90:10 (scenario 1), 80:20 (scenario 2), and 70:30 (scenario 3) ratios.This was done to understand the influence of training and testing data proportions on the model's performance [13].Additionally, experiments with random states in the range of 1-100 were conducted for each scenario to assess the impact of random state on dataset splitting and model results [14].

Model Training
The Multinomial Naive Bayes method was used for sentiment classification.This method is well-suited for text classification tasks due to its simplicity and effectiveness.The model was trained on the training set and tested on the testing set for each split scenario and random state.

Performance Evaluation
The performance of the trained model is evaluated using a confusion matrix, which provides insights into the model's accuracy, precision, recall, and F1-score of the model applied in the previous process [15].These metrics help assess how well the model classifies the sentiments of user reviews.These metrics provide a comprehensive view of the model's performance.

Visualization
The final step involves visualizing the results of the sentiment analysis.In this research, the visualization of sentiment analysis results will be presented in the form of a Word Cloud.The Word Cloud is chosen due to its ability to visually display the topological associations, correlations, and relevance of selected articles in an appealing manner, thereby aiding in the quick interpretation of textual data [16].

C. Result and Discussion 1. Text Preprocessing
The raw data collected undergoes text preprocessing to clean the dataset, making it easier for analysis.The results of the text preprocessing are then stored in a new column named content_preprocessed. Figure 3 displays the outcome of the text preprocessing.

Figure 3. Text Preprocessing Result
From these results, reviews that previously contained noise such as emoticons, punctuation marks, typographical errors, and foreign words become cleaner and more readable, facilitating the subsequent labeling process using the lexicon dictionary.Next, reviews that are empty or duplicates after the text preprocessing are eliminated.Reviews may become empty because users only provided reviews in the form of emoticons or words that fall into the stopword list defined in the previous process.This elimination process results in a reduction in the dataset size, detailed in the table 1.

Data Labeling
After the dataset is cleaned, automatic labeling is performed using the SentiStrengthID lexicon dictionary.Figure 4 shows the results obtained from the labeling process.

Figure 5. Data Labeling Results
Figure 5 and Figure 6 shown label distribution from each dataset.From the two graphs, it is evident that the label distribution in both datasets (Indodax and Tokocrypto) shows a similar pattern.The Positive label dominates the number of samples in both datasets, while the Negative and Neutral labels have significantly fewer samples.Although the total number of samples in the Tokocrypto dataset is smaller compared to the Indodax dataset, the label distribution exhibits a consistent trend.

Modelling
After the dataset was labeled, it was then weighted using TF-IDF.The results of the TF-IDF weighting can be seen in Figure 2. Following this, the dataset was split according to the scenarios previously described.The results of the dataset splitting are shown in the table 2. From the table, we can observe the distribution of data across different scenarios.In Scenario 1, the dataset is split with the highest proportion of training data, leaving a smaller portion for testing.Specifically, for Indodax, 30,022 instances were used for training and 3,336 for testing, totaling 33,358 instances.Similarly, for Tokocrypto, 12,033 instances were used for training and 1,337 for testing, totaling 13,370 instances.After the dataset was split, an experiment was conducted to find the optimal random state within the range of 1-100.After conducting the experiments, the results for the highest accuracy achieved for each random state scenario are summarized in the table 3. Table 3 provides a comprehensive view of the accuracy, precision, recall, F1score, and computation time for the Indodax and Tokocrypto datasets.These results indicate that the highest accuracy for the Indodax dataset was observed in Scenario 1, while for the Tokocrypto dataset, it was also observed in Scenario 1.The precision, recall, and F1-score values provide additional insights into the model's performance, and the computation time indicates the efficiency of the model training process.

D. Model Evaluation
In this section, we evaluate the performance of the Multinomial Naive Bayes (MultinomialNB) model using the Confusion Matrix to calculate Precision, Recall, and F1-Score for each label (Positive, Negative, and Neutral).The evaluation was performed for both the Indodax and Tokocrypto datasets across three different scenarios.The results of the evaluation are summarized in the following tables.Tables 4 and 5 showcase the evaluation results of a classification model based on sentiment labels (Positive, Negative, and Neutral) for the Indodax and Tokocrypto datasets across three different scenarios.Each table outlines the key performance metrics: Precision, Recall, and F1-Score for each sentiment label.In Table 4, the Indodax dataset results indicate that the Positive label consistently performs well, with Precision and Recall exceeding 82%.In contrast, the Neutral label demonstrates much lower performance, especially in Recall and F1-Score.Table 5 illustrates similar results for the Tokocrypto dataset, where the Positive label again shows high performance with commendable Precision and Recall.However, the Neutral label suffers from significantly lower performance metrics.

E. Visualization
At the final stage of this research, the visualization of sentiment analysis results is performed to provide a clearer overview of the analyzed data.The results of this visualization will be presented in the following figures.
Figure 9 represents a word cloud generated from positive sentiments about the Indodax application.The most prominent words in this cloud are: • Kripto (Crypto): Indicates that users frequently mention cryptocurrencies positively.
• Mudah (Easy): Suggests that users find the platform easy to use.
• Trading: Indicates that trading on the platform is a positive experience for users.• Investasi (Investment): Reflects that users see the platform as a good place for investment.• Aman (Secure): Implies that users feel secure using the platform.
Figure 10 represents a word cloud generated from negative sentiments about the Indodax application.The most prominent words in this cloud are: • Biaya (Cost): Indicates that users are concerned about the costs associated with using the platform.• Lama (Long): Suggests that users experience long wait times or delays.
• Mahal (Expensive): Reflects that users find the platform expensive.
• Susah (Difficult): Indicates that users find certain aspects of the platform difficult to use.• Withdraw: Implies that users face challenges when trying to withdraw funds.
• Kripto (Crypto): Highlights that cryptocurrency is a frequently mentioned and positively viewed aspect of the platform.• Trading: Reflects that users have a positive experience with trading on the platform.• Mudah (Easy): Indicates that users find the platform easy to use.
• Baik (Good): Implies users perceive the platform positively overall.
Figure 12 represents a word cloud generated from negative sentiments about the Tokocrypto application.The most prominent words in this cloud are: • Withdraw: Indicates users face difficulties with withdrawing funds.
• Biaya (Cost): Suggests concerns about the costs associated with using the platform.• Susah (Difficult): Reflects that users find certain aspects of the platform challenging.
• Rumit (Complicated): Implies complexity in using the platform.• • Masuk (Login): Indicates issues with logging into the platform.
Other notable words include "lama" (long), "verifikasi" (verification), "trading" (trading), "kode" (code), and "salah" (wrong).These words highlight common issues such as delays, verification problems, challenges with trading, code errors, and general dissatisfaction with the usability and processes on the platform.This research provides an extensive sentiment analysis of user reviews for Indodax and Tokocrypto using the Multinomial Naive Bayes method.The analysis demonstrates that the choice of random states and dataset split ratios significantly affects model performance.Generally, smaller test sizes lead to higher accuracy but with greater variability, highlighting the sensitivity of model accuracy to the proportions of training and testing data.The Positive sentiment label consistently performs well, whereas the Neutral label exhibits lower performance, particularly in recall and F1-score.These results underscore the importance of careful dataset splitting strategies and random state variations in sentiment analysis.The insights gained are essential for developers seeking to enhance service quality and for investors aiming to make well-informed decisions.This study emphasizes the critical role of sentiment analysis in capturing user feedback and improving the credibility and functionality of cryptocurrency investment platforms in Indonesia.

G. Acknowledgment
I would like to express my gratitude to the lecturers at Telkom University Surabaya and everyone who has assisted in this research.

Figure 4 .
Figure 4. Tokocrypto Dataset Label Distribution Figure 7 and 8 shown the results obtained from the experiment for each dataset.

Figure 7 .Figure 8 .
Figure 7. Accuracy of MultinomialNB Based on Random State for Indodax Dataset

Table 1 .
Reduction in Dataset Size Text Preprocessing

Table 3 .
Highest Accuracy Results for Each Random State Scenario

Table 4 .
Evaluation of Indodax Dataset

Table 5 .
Evaluation of Tokocrypto Dataset