A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets

Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the COVID-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348.


I. INTRODUCTION
Monkeypox is a viral disease caused by the monkeypox virus (MPXV), belonging to the same family of viruses that causes The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Asif . smallpox, known as the variola virus [1]. The World Health Organization (WHO) declared the spread of Monkeypox a global health emergency [2] due to its sporadic outbreak. The department of health and human services secretary of the United States, Xavier Becerra declared this virus a public health emergency on August 4, 2022 [3], because of the increased number of cases reported in the US. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Monkeypox was first discovered in 1958 when the colonies of monkeys in a research institute in Denmark developed a pox-like disease. The first case in a human was confirmed in 1970 in the Republic of Congo [4]. Recently, cases have been reported from over 73 countries, and the record shows that the total number of cases reported worldwide as of September 23, 2022, was 65,415, with 24,846 cases from the United States of America [5]. The most trusted diagnosis for the virus is the polymerase chain reaction (PCR) test, and the available solution remains the development and administration of vaccines.
Studies have shown that Twitter can be an excellent data source for analyzing events worldwide, including healthrelated issues [6]. For example, since the eruption of COVID-19, social media platforms such as Facebook, Instagram, Pinterest, and Twitter have been the most active means of expressing opinions and sharing information among users [6]. Analyzing content posts is a way to understand the perception of human thought and emotions, as well as reveal the current mood and disposition of the broader human population.
Society's reliance on social media for information is enormous, unlike conventional news sources. The volume of data accessed daily led to the adoption of natural language processing (NLP) for text analytics [7]. This information may include social trends, governmental policies, public health, and other related matters. Companies also use social media to promote their products and services due to its low cost, easy access, and connectivity within the social media network. Therefore, social media platforms become a repository for information sources, reviews, and open communication where users' experiences are shared.
Understanding public's perception on infectious diseases is critical for the government and policymakers in formulating policies and mitigation strategies to control its spread. This study aims to identify people's sentiments expressed on Twitter about monkeypox disease. Our study applied natural language processing techniques to the datasets to make them suitable for our experiment. We annotated the preprocessed data using VADER and TextBlob and then vectorized them using CountVectorizer and TF-IDF. We adopted different machine learning algorithms to classify the sentiment into positive, negative, and neutral. The best-performing model was identified and optimized.
The rest of the research work is organized as follows: Section II presents a literature review of related work, Section III discusses the research methodology, Section IV presents the experimental results, Section V discusses our contributions, section VI concludes the research work, and section VII presents the future work.

II. LITERATURE REVIEW
With the large number of users on social media platforms, public opinions have become imperative to consider as a tool in decision-making. They provide insight into how people react to a particular topic. There have been several studies on infectious disease sentiment analysis such as the study performed by Neha et al [8]. Their study revealed the sentiment analysis of people during the coronavirus pandemic using deep learning algorithms: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). These algorithms were used to develop model to predict the impact of the pandemic on the general populace. Another study similar to that of Neha et al. [8] was the study conducted by Chakraborty et al [9] which proposed a model based on deep learning classifiers and Gaussian functions to classify the sentiments of the public from tweets related to COVID-19 from the beginning of the virus outbreak through May 2020. The result of their study emphasized the need for a monitoring mechanism to prevent the spread of negative information about the virus.
Shahi et al. [10] also conducted a sentiment analysis of COVID-19 prediction. In their study, they performed analysis using tweets available in the Nepali language employing two widely used text representation methods: TF-IDF and FastText to capture discriminating features within the dataset. They developed models that implemented nine machinelearning algorithms to extract hybrid features from their dataset. The study in [11] and [12] is an extension of their study, where a multichannel CNN was used to perform sentimental analysis based on hybrid features. Their result showed that the proposed model provided excellent performance when compared with the state-of-the-art methods. However, the study is limited in scope because it only considered tweets posted only in the Nepali language.
A further study by Chinnasamy et al. [13] also performed a sentiment analysis to classify people's opinions and reactions to the COVID-19 Vaccine. They proposed a model that classified their dataset into positive, negative, and neutral sentiments. In their study, raw tweets were stored and processed using NLP, and then deployed a supervised KNN algorithm as the learning algorithm. The result of their experiment showed that people had a more positive perspective of Pfizer than the rest of the other covid -19 vaccines. A further review by the authors in [14] examined tweet sentiments on COVID-19 vaccination hesitancy. Their results indicated that COVID-19 vaccine hesitancy has steadily declined over time.
Considering other existing work on infectious diseases outbreak, we reviewed a study conducted by Chung et al. [15]. The study presented a sentiment and emotional analysis of tweets extracted from the social media platform on the Ebola disease outbreak. In the study, the authors proposed a model known as eMood. The proposed model used a comprehensive lexicon to identify emotion categories using a linear regression technique. The result of their study indicated that there is a relationship between user thoughts and emotions.
The recent outbreak of monkeypox disease has attracted research in this domain. Thakur et al. [16] created the first open-source monkeypox dataset following its outbreak.
In their study, more than 255,000 English-language tweets related to the 2022 monkeypox outbreak were collected. Their dataset was extracted by searching for the keyword ''monkeypox'' using RapidMiner software. This study only focused on the dataset features' development and classifications. However, no analysis was carried out on the developed datasets.
As a way to distinguish between monkeypox and other pox-like diseases, Sitaula et al. [17] performed an intensive study on the detection of the monkeypox disease using deep learning and computer vision techniques such as the visual geometry group (VGG) and the ResNet. Their study provides 13 Pre-trained Deep Learning (DL) models for monkeypox detection. The models were compared and evaluated; the performance of their model showed it is a reliable method for monkeypox virus detection.
Monkeypox is often mistaken for warts-benign bumps found on the skin, which could result in a wrong diagnosis and treatment. Tackling this misrepresentation, Alakus et al [18] performed an analysis on the DNA sequences of HPV-causing warts and MPV-causing monkeypox disease. The classification of the sequences was conducted using a deep learning algorithm. An average accuracy of 96.08 percent was achieved in their study. The study showed that the two diseases can be classified using their DNA sequences, which can help prevent wrong diagnoses and treatments.
Another study was carried out by the author [19] on monkeypox analysis. The study hypothesized that information sharing, and data seeking can be obtained through Google trend and Reddit platforms. However, the result of the experiment suggested that there was no significant discussion related to monkeypox on both Google trend and Reddit as compared to the information available on other social media platforms.
Jahanbin et al. [20] also did a study on the prediction of the monkeypox outbreak using data collected from Twitter and web news mining. Their study used the Fuzzy Algorithm for Monitoring, Extraction, and Classification (FAMEC) methodology to predict the outbreak of monkeypox disease. The dataset was cleaned, classified, and evaluated based on the developed algorithm. Their study showed the FAMEC model has the potential and capability to track and monitor zoonotic diseases like monkeypox, but the data collected for analysis was limited to posts available at the beginning of the outbreak. Similarly, Mohbey et al. [21] provided an analysis of individuals' thoughts about monkeypox disease. In their study, they presented a hybrid deep learning technique based on CNN and Long short-term memory (LSTM) which generated three possible sentiments -positive, negative, and neutral from people's tweets on Twitter. They developed a CNN-LSTM model to determine their model's accuracy, but the scope of their analysis was limited to tweets in the English language which does not allow a larger perspective on the subject matter.
People leverage social media platforms to express their feelings and sentiments about public health situations. Therefore, it is critical to analyze public sentiment and its dynamics to reveal insights into current issues. Based on the previous research on monkeypox sentiment analysis, there seem to be gaps that needed to be discussed.
1. Previous studies on monkeypox analysis showed that most of the data collected for use were limited to the first detected outbreak cases of the virus. We believe that the monkeypox situation may have changed regarding the number of cases and public views posted on social media. Hence performing an analysis of the most recent cases is vital. 2. Also, most prior research on this topic was based on downloaded datasets from other public sources, which does not specify how their datasets were preprocessed. The data preprocessing step in data analysis is vital for achieving the most effective classification model. Hence, addressing each analysis step is critical and needs to be discussed. 3. Most of the previous research focused only on Englishlanguage tweets. We argue that extending the scope of our analysis across several other languages would help consider the larger community's opinions in making better decisions.
Given the gaps identified, this research aims to perform a sentiment analysis on the monkeypox outbreak with an upto-date dataset collected from tweets posted on the Twitter social media platform. To generate insight into how public opinions can help policymakers and health authorities take proactive steps and decisions to control the outbreak of this disease, as well as enlighten the general populace on taking preventive measures amidst the crisis. Our study collected and preprocessed a multilingual dataset of 103 languages for a detailed analysis. Table 1 shows the distribution of our dataset based on the top 4 languages. We designed, developed, and evaluated 56 models. Modeling was based on a combination of Natural Language Processing and different learning algorithms. The best-performing classification model was identified.

A. OVERVIEW
Our experimental framework shown in figure 1 began with the collection of data, translation, and preprocessing. During preprocessing, we removed retweets, punctuation marks, hashtags, user tags, stopwords, numbers, repeated words, and the emojis were converted to text. Stemming and lemmatization were applied to normalize the preprocessed dataset. VADER and TextBlob techniques were applied to compute sentiment scores for the dataset. Vectorization of tokens was achieved using CountVectorizer and TF-IDF techniques. The final step was to construct classification models using machine learning methods such as Random Forest, Naïve Bayes, K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), support vector machine (SVM), and logistic regression.

B. DATA COLLECTION
This study extracted tweets using the Twitter API and Tweepy library. Tweepy is a python package for accessing Twitter API that allows developers to access Twitter content, such as tweets, retweets, and timestamps. A python script was used to search for all the tweets related to the keyword ''#monkeypox''. All text that met our criteria was extracted and stored in a comma-separated values (CSV) file. We considered a total of five features for our analysis: text, timestamps, author, source, and language. Google Translate API was used to translate all non-English tweets to English.
We collected over 500,000 tweets between July 2022 and September 2022; however, after preprocessing, we were left with 107,000 unique tweets. Table 1 shows our dataset's top five language counts; we had about 103 languages in total.

C. DATA PREPROCESSING
Data preprocessing is an integral part of natural language processing (NLP) that helps to reconstruct raw text into a meaningful format. A variety of tools and mechanisms were used in our study for preprocessing. Since our dataset is multilingual, it is crucial that all the data are in the same language. To accomplish this, we used the Google Translate API to translate all non-English tweets into English. All retweets, punctuation, hashtags, stop words, tokenization, stop words, repeated words, stemming, and lemmatization are then removed. We will discuss each task in the following manner.

1) RETWEET (RT) AND USER TAGS REMOVAL
Retweeting is resharing someone's tweets on Twitter. By sharing tweets, duplicates are created that can adversely affect model training and accuracy, so it was necessary to remove retweets. An ''RT'' indicates a retweet, while an ''@Someone'' indicates a user tag. They were omitted too.

2) EMOJI AND TEXT CONVERSION
People express their opinions and emotions using small digital images and icons called emojis. We converted these images into their corresponding textual format to improve our model training. Also, all the dataset texts were converted into lowercase to avoid double recognition of the same word.

3) HASHTAG, NUMERAL, AND PUNCTUATION REMOVAL
A hashtag is a common term for searching and retaining related content on social media [20]. Usually, the hash sign (#) precedes the keyword, and it is a powerful tool in social media but unnecessary for learning models; hence it was removed from the dataset. Numerical, repeated words and punctuation were removed using regular expressions (RegExp). This reduced memory consumption and accelerated the learning process.

4) STOPWORD REMOVAL
A stopword refers to words that don't add much meaningful information to a sentence, such as 'to,' 'me,' 'my,' 'ours,' etc. For this reason, they were removed using a Python package library named stopword, to avoid noise in our dataset.

5) TOKENIZATION
Tokens are created by splitting text into smaller chunks of individual words using the natural language toolkit. Sentiment analysis requires tokenization to simplify feature extraction.

6) LEMMATIZATION AND STEMMING
A tweet can have one word written differently with the same meaning since there is no standard format. Using lemmatization and stemming, we avoid such a scenario. Lemmatization converts all words into their dictionary-based form, commonly known as a lemma while stemming involves discarding the word's last few characters to get the word's meaningful base.

D. DATASET EXPLORATION
For familiarity and insight, we explored the dataset before training. High-frequency words were extracted and visualized. For text exploration, the next section discusses word frequency and word clouds.

1) WORD FREQUENCY
Following preprocessing, word frequency was used to explore the dataset. By analyzing word frequency, we were able to identify the most commonly used words. In Table 2, the ten most common keywords are listed. Monkeypox was the most common word because everyone who posted about this outbreak included monkeypox in their sentence.
The second phrase with the highest frequency was not. In our study, we did not consider 'not' as a stopword because removing it could cause the sentence to lose its meaning. Twitter users dispel the myth that monkeypox is spread by air contamination, such as COVID-19, while others claim that monkeypox is a sexually transmitted disease. These different perspectives are the reason why it is not considered. Vaccine emerged as the third most frequently used word, probably FIGURE 1. Experimental framework. The figure elucidates a step-by-step methodology for our experiment starting from data collection to pre-processing, labeling, vectorization, and classification algorithms applied with their respective components.
because Twitter users were discussing whether to accept or hesitate about monkeypox vaccination. Another word that was frequently used was case, possibly because many people were discussing new cases of monkeypox at the time.
Furthermore, it was found that many people were concerned about their health, so it may be possible that this is why health was listed as the fifth most frequent word in the search results. Some discussions also compared the covid-19  pandemic with the most recent outbreak, which caused the covid word to appear as number six in the rank. News ranked seventh, possibly because of continued coverage of the outbreak. It was followed in the ranking by the keyword people; presumably, most posts discussed the number of people affected by the disease.
Several tweets termed this disease a sexually transmitted disease, possibly resulting in the sex keyword being ranked tenth in frequency.

2) WORD CLOUD
We attempted word cloud to further investigate the prior discussed exploration technique. Word cloud is a visualization technique of observing the most common word available in our dataset. We generated four visualizations: one for the entire dataset and three for the sentiment polarities (positive, negative, and neutral). It was observed that all the words which showed up in the keyword frequency popped up in the visualization shown by figure 2. Again, monkeypox, case, new, covid, people, and vaccine were among the most visible words in all visualization E. DATA LABELING Data labeling refers to adding a label(s) to raw data. Usually, it serves as a bridge between raw data and results obtained from a machine learning model. It helps a machine learning model identify the specific class of an object in a dataset. Due to the volume of data collected in our study, we applied an existing tool for annotation using VADER and TextBlob.
In VADER, we used the output compound score to determine the label, while we used the polarity score to determine the label in TextBlob. Positive, Negative, and Neutral were the three possible labels. Algorithm 1 shows detailed steps of our annotation. VADER is an acronym for the Valence Aware Dictionary and Sentiment Reasoner. It is an open-source platform based on lexicon and rule-based, invented in 2014 [6]. It was used to analyze text based on three polarities (negative, neutral, and positive). VADER labels text based on the polarity score calculated by normalizing the sum of the positive, negative, and neutral scores of sentences/words. The normalization falls within the range of −1 (most extreme negative) and +1 (most extreme positive). Equation 1 summarizes the labeling process in Vader according to Hutto et al [22], [23].
As an outcome of the VADER application, out of 107,336 tweets in our study, about 27.3% were positive,  25.7% were negative, and 47% were neutral tweets. Table 3 shows the frequency of each polarity. Table 4 demonstrates the output samples from VADER technique implementation.

2) TEXTBLOB
TextBlob is another sentiment analyzer based on lexicon (Rule-based sentiment analyzer) [24] that we adopted in our study. We developed a python loop on all rows in our datasets, and the polarity and subjectivity were returned through the textblob() call. A polarity score is a floating number between 0 and 1, while its subjectivity lies between 0 and 1 [25]. In this study, we are interested in the polarity score, converted to a label, as shown in Equation 2.
As Table 5 depicts, 36.9% of the dataset was labeled positive, 13.8% as negative, and 49.3% as neural.
It was interesting to see a considerable variation between TextBlob and Vader. Positive polarity for TextBlob was almost 10% more than positive polarity in VADER. There had a 12% disparity between VADER and TextBlob for negative VOLUME 11, 2023  polarity scores. The neutral polarity had slight differences of about 2% Table 6 shows some disparities between the two annotation methods in our study. Our findings agree with previous works on the disparities between the two annotation techniques. Studies have shown that unlike TextBlob, VADER is more focused content [26], discerning polarity, and sentiments for emojis in social media [27].
In summary, by looking at the result of both techniques, it was evident that most people either had a positive or neutral opinion regarding the monkeypox outbreak; suggesting that people understood the outbreak.

F. WORD EMBEDDING AND VECTORIZATION
Word embedding or text vectorization is a technique in natural language processing that maps the words or phrases from sentences to the corresponding vector of real numbers used to find word prediction, similarities, and semantics [28]. Performing word embedding makes it easier to train and extract features in machine learning. Below are different methods of vectorization applied in our study:

1) COUNTERVECTORIZER
CounterVectorizer is a vectorization method that converts a data (text) into a vector of words and its corresponding frequencies [29]. This technique creates a dictionary of all possible and available words in the dataset where each word in the sentence is assigned a specific random number.

2) TF-IDF
For a more balanced study, another vectorization method called TF-IDF was used. This technique consists of Term frequency (TF) and Inverse document frequency (IDF). TF focuses on the raw word count in the document, while IDF focuses on how the frequency of a word is measured. Equation 3 depicts the TF formulation.

TF(t, d) =
Frequency of term (t) in the document(d) Total word in the document (d) IDF's purpose is to calculate the informativeness of the word in a document. We need IDF because it helps minimize the weight of frequent terms and makes infrequent terms have a high impact. IDF can be computed using Equation 4.

G. LEARNING ALGORITHMS
In this study, we designed, developed and evaluated various models using different machine learning algorithms. 80% of the datasets was used for training and 20% for validation. We measured the performance of each model based on accuracy, precision, recall, and F1 score. Default hyper-parameter values in sklearn were used for each of learning algorithms. Below is the discussion of the implemented algorithms.

1) LOGISTIC REGRESSION
Logistic regression can be defined as the supervised learning algorithm which predicts the probability of an event occurrence. Its probability is merely based on the selected independent variable vs the dependent variable. This kind of modeling outputs discrete outcome for the given input variable. Logistic regression is mathematically represented in equation 6.
The cost function is defined in equation 7.
Logistic regression uses binary cross entropy (see Equation 8), also known as log loss, as the loss function [31] where: N-total number of categories Y-dependent variable.

2) Naïve BAYES
The Naïve Bayes was another algorithm we adopted for classification. It is a probabilistic classifier that uses conditional probability to determine the class likelihood of its input [32]. Equation 9 defines the calculation of probabilities for each class involved.
conditional Probality * Prior Probabilty Evidence (9) Thomas Bayes (1701-1761) terms the output as the posterior probability. Mathematically, P y X = P X y P(y) where: P(y): Prior Probability P (X /y): Likelihood probability P (y/X ): Posterior Probability P (X ): Marginal probability (Evidence). Looking at Equation 10, it means that event y will occur given that event X occurred. In this case, y happens as the hypothesis and X as evidence. y event depends on event X occurrence. X comprises all independent features i.e (x 1 , x 2 . . . . . . ..x n ). There are multiple types of naïve Bayes; however, in our study, we applied multinomial naïve Bayes, a model which focuses on document processing classification problems [33]. This model assumes that each feature consists of multinomial distribution whereby each word count toward class prediction. Equation 11 defines the multinomial naïve bayes.
where f is the frequency of a word (w) in document (d), P(c) is the prior probability that class (c) belongs to the document. P (w i /c ) is the conditional probability that the word occurs in the document belongs to the class.

3) SUPPORT VECTOR MACHINE
SVM is a robust model that sets boundaries between classes [34], sorting data into one of the available categories. SVM needs decision boundaries (separator lines) between classes called a hyperplane. There exist three hyperplanes, namely positive, negative, and optimal hyperplanes. Mathematically these hyperplanes are presented by Equation12-14: (14) w is the width of the margin, b is the bias, and x is the features. The margin width needs to be maximized for the model to have the best optimal hyperplane. In cases where the problem is non-linear, the algorithm is good enough to solve it using the kernel, which mounts into higher dimensions, making them separable. Some kernels exist in SVM, such as polynomial, Gaussian, and Gaussian radial basis functions (RBF).

4) RANDOM FOREST
Random Forest is another model we utilized. It is an ensemble machine-learning classification algorithm. This algorithm develops numerous decision trees used for classification, which implies that class category is selected by most of the trees. This approach involves randomization and aggregation of tree prediction [35] into the final output.
Random forest requires at least three hyperparameters to be in place: node size, number of trees, and number of features sampled. It applies bootstrap aggregation, also known as the bagging ensemble technique, which creates a different subset of training adopted from sample training data. The result depends on the rate of preference [36].

a: K-NEAREST NEIGHBOR (KNN)
KNN is one of the most popular machine learning algorithms used not only as a classification algorithm but also in information retrieval, pattern recognition, and regression problems [37]. It is a non-parametric algorithm capable of generating a consistent result in a data sample. The algorithm first turns data into a vector with features extracted to help find similarities between two data points using a distance measurement.
Our study used a supervised KNN classifier to classify the polarized data. We classified data based on the polarity score. A tweet with a polarity score greater than zero (Tweet Polarity > 0) is classified as a positive. Polarity score less than zero (Tweet Polarity < 0) is classified as a negative. If the polarity score is equal to zero (Tweet Polarity == 0), then we classified those as neutral [7]. The KNN algorithm used feature similarities to assign a data point based on how close it is to its neighbor. Algorithm 2 describes KNN in detail: To calculate the distance between each data point in the KNN algorithm, we used the Euclidean distance, which is as calculated in Equation 1 Euclidean Distance Function.  A chart illustrating the performance metrics for each machine learning algorithm in terms of accuracy, precision, recall, and F1 score.

Algorithm 2 KNN Algorithm
Step 1: Load the preprocessed datase Step 2: Determine the parameter K Step 3: Calculate the distance between each data point Step 4: Sort data points according to the distance calculated Step 5: Select the top K row Step 6: Assign data points on the most frequent class Step 7: END

5) MLP
The Multilayer Perceptron model is one of the standard neural network models with a simple mathematical function used to learn complex features within a dataset. Generally categorized under the feedforward algorithm, where inputs and initial weights are combined in a weighted sum, subjected to the activation function [38].
In our MLP, we used gradient descent as the optimization function, i.e., for all iterations, a gradient mean-square error is computed until a specified convergence threshold is attained [39]. The mean squared error is calculated using the Equation 16:

6) XGBoost
eXtreme Gradient Boosting is a flexible gradient boosting decision tree available in machine-learning library. It provides cutting-edge results on many machine-learning problems [40]. XGBoost builds upon the concept of supervised machine learning, decision trees, ensemble learning, and Gradient boosting. It has an advantage over these machine learning methods due to its flexible nature and high speed, its ability to exploit parallel processing, support regularization, handle missing data, run cross-validation, and is essentially suitable for small and medium datasets. In our study, XGBoost was used from a scikit-learning package in Python. In our study, XGBoost was used along with a scikit-learning package in Python.

IV. EXPERIMENT RESULTS AND DISCUSSION
This section presents the experimental steps used to evaluate the performance of the proposed models. In our study, we designed, developed, and evaluated 56 models based on the labeling, vectorization, and normalization methods. Table 7 shows the eight (8) ways we generated our models. We trained and evaluated each model on seven (7) machine learning algorithms, including Random Forest, Logistic regression, Support vector machine (SVM), multilayer  The results obtained are listed in the figure as follows: • Table 8 shows the output for models developed from the combination of Stemming tokenization and VADER labeling with CountVectorizer or TF-IDF Vectorizer • Table 9 shows the output for the models developed from combination of Lemmatization tokenization and VADER labeling with CountVectorizer or TF-IDF Vectorizer • Table 10 shows the output of models developed from combination of Stemming tokenization and TextBlob labeling with CountVectorizer or TF-IDF Vectorizer • Table 11 shows the output of models developed from the combination of Lemmatization tokenization and TextBlob labeling with CountVectorizer or TF-IDF Vectorizer • Figures 3, 4 and 5 are the graphical representation of tables 8, 9, 10 and 11 respectively From our experiments, we applied two normalization techniques, namely, Stemming and Lemmatization. We observed that lemmatization models showed better results than the stemmed models in all the cases. In the case of labeling, the TextBlob labeling model was better than the VADER model, and the CountVectorizer model outperformed the TF-IDF model. The model which applied SVM, lemmatization, CountVectorizer, and TextBlob annotation emerged as the best model, with an accuracy of about 0.9348. Random Forest, MLP, and logistic regression also performed well, with an accuracy ranging from 0.85 to 0.92. KNN and the  XGBoost algorithms had the least performance with an accuracy that ranges between 0.6 to 0.

V. CONTRIBUTION
Previous studies have explored monkeypox sentiment analysis, our study highlights the following key contributions: 1) This study extracted over 500,000 monkeypox datasets from tweets between July 2022 and September 2022.
Preprocessing and normalization of the collected data resulted in 107, 000 clean datasets. Preparing the datasets, we annotated and classified them into positive, negative, and neutral sentiments. These datasets are available to the public for research purposes1. According to [15], the authors generated over 255,000 monkeypox datasets from tweets available between May 2022 and June 2022. Another author also published a study in [17] using 61,379 datasets obtained  from a public source. Data collected were also between May 2022 and June 2022. However, this study contains a much larger number of recent datasets than previous studies, since there is a possibility that the current monkeypox situation might have changed. Therefore, using the most recent datasets, we argue that our experimental results are more reliable and reproducible than previous studies.
2) The research presented in our study was based on a multilingual dataset, which makes it unique compared to other studies. Over 103 languages of tweets were extracted and analyzed. Table 1 shows the top five languages processed in the study. Previous research in related fields has not considered several languages. As a result, we were the first to perform sentiment analysis on monkeypox disease with multilingual datasets. We believe that our study provides a more universal and broader approach in understanding the sentiments of the public on the monkeypox virus.
3) We explored the tweeter dataset using Word Frequency and Word Cloud techniques. Our Word Frequency analysis showed that Monkeypox, Not, Vaccine, Cases, Health, Covid, 4) An evaluation of 56 classification models was conducted in this study. Stemming and lemmatization were used in the vocabulary normalization process. Vectorization was performed using the CountVectorizer and TF-IDF techniques. Our learning algorithms included K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naive Bayes, and XGBoost. Several factors were considered to evaluate performance, including Accuracy, F1 Score, Precision, and Recall. We found that the model combining TextBlob annotation, Lemmatization, CountVectorizer, and SVM was the most effective with an accuracy of 93%.

VI. CONCLUSION
The recent outbreak of Monkeypox has raised intense discussion on social media, especially Twitter, with different perspectives from users. Understanding the sentiment behind these expressions is critical. Using massive flow of data and the abundance of opinions, emotions, we seek to obtain important information on social media platforms. We extracted, labeled, and preprocessed collected datasets from tweets on Tweeter. We explored the dataset and built classifiers.
Before training and testing our models, we implemented text normalization and vectorization. 80% and 20% of the dataset were used for training and testing, respectively. Fifty-six (56) models in total were designed, developed and evaluated. The accuracy of these models ranges from 0.65 to 0.93. Models were generated from seven machine learning algorithms with a combination of vectorization, normalization, and annotation techniques. Our finding reflects the importance of sentiments in keeping track of public opinions. We believe that our analysis will help health authorities and individuals to take proactive measures in providing mitigating measures to reduce the spread of the disease.

VII. FUTURE WORK
For future work, additional methods and techniques will be implemented for word embedding (example: doc2Vec) and text labeling (example: Azure Machine Learning) to improve the model's performance. Moreover, we plan to use deep learning and transformer algorithms for a more accurate sentiment analysis and emotion prediction. From 2019 to 2021, he was a Research Graduate Assistant at the Computer Science Department, Bowie State University, where he has been an Adjunct Professor with the Computer Science Department, since 2021. His research interests include machine learning, deep learning, and cloud computing. VOLUME 11, 2023 TIMOTHY OLADUNNI received the master's and doctoral degrees in computer science from Bowie State University, MD, USA, in 2013 and 2017, respectively.
He was an Assistant Professor at the Department of Computer Science and Information Technology, University of the District of Columbia, Washington DC, USA. He is currently an Assistant Professor of computer science with Morgan State University, MD, USA. He explores computer science fundamental concepts in developing sustainable, efficient, and innovative solutions to real world problem. He has a broad research experience in artificial intelligence with specific expertise in natural language processing, computer vision, data science, pattern recognition, and computational epidemiology.
RUTH OLUSEGUN received the B.S. degree in computer science and the M.S. degree in information security from the University of Ilorin, Nigeria, in 2011 and 2015, respectively, and the M.S. degree in computer science with a specialization in AI from Bowie State University, MD, USA, in 2022, where she is currently pursuing the D.S. degree in computer science.
She was an information security analyst at a Financial Institution, Nigeria, where she participated in several projects. Since 2018, she has been working as a Research Assistant at the Department of Computer Science, Bowie State University. She is currently a Research Scientist with AI Squared Inc., MD, USA. Her research interests include artificial intelligence with a special focus on machine learning, deep learning, natural language processing, and Blockchain technologies.
HALIMA AUDU received the bachelor's degree in electrical engineering from Ahmadu Bello University, Nigeria, in 2012, and the master's degree in computer science from Bowie State University, MD, USA, in 2018, where she is currently pursing the doctoral degree in computer science.
Since 2018, she has been an Adjunct Faculty Member with the Department of Computer Science, Bowie State University. Her research interests include cloud computing, artificial intelligence, and cybersecurity.