Future Prediction of COVID-19 Vaccine Trends Using a Voting Classiﬁer

: Machine learning (ML)-based prediction is considered an important technique for im-proving decision making during the planning process. Modern ML models are used for prediction, prioritization, and decision making. Multiple ML algorithms are used to improve decision-making at different aspects after forecasting. This study focuses on the future prediction of the effectiveness of the COVID-19 vaccine effectiveness which has been presented as a light in the dark. People bear several reservations, including concerns about the efﬁcacy of the COVID-19 vaccine. Under these presumptions, the COVID-19 vaccine would either lower the risk of developing the malady after injection, or the vaccine would impose side effects, affecting their existing health condition. In this regard, people have publicly expressed their concerns regarding the vaccine. This study intends to estimate what perception the masses will establish about the role of the COVID-19 vaccine in the future. Speciﬁcally, this study exhibits people’s predilection toward the COVID-19 vaccine and its results based on the reviews. Five models, e.g., random forest (RF), a support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), and an artiﬁcial neural network (ANN), were used for forecasting the overall predilection toward the COVID-19 vaccine. A voting classiﬁer was used at the end of this study to determine the accuracy of all the classiﬁers. The results prove that the SVM produces the best forecasting results and that artiﬁcial neural networks (ANNs) produce the worst prediction toward the individual aptitude to be vaccinated by the COVID-19 vaccine. When using the voting classiﬁer, the proposed system provided an overall accuracy of 89.9% for the random dataset and 45.7% for the date-wise dataset. Thus, the results show that the studied prediction technique is a promising and encouraging procedure for studying the future trends of the COVID-19 vaccine. In this study, different ML models were used to predict future trends. The dataset used in this study was fetched from Kaggle in a comma-separated values (CSV) format which contains feedbacks regarding COVID-19 vaccination. Preprocessing was achieved using python programming language. The processed form of the dataset was then divided into two parts: the testing set (20% tweets) and the training set (80% tweets). Furthermore, the data were divided into two categories: the date-wise dataset and the random dataset. The date-wise dataset follows the sequence from a speciﬁc date until the last date. However, the random dataset was chosen randomly, which does not follow any speciﬁc date. For experimental purposes, both datasets trained 20% of tweets while 80% of the dataset was tested on a trained model. The training set was trained on ML classiﬁers, e.g., decision tree, random forest, a support vector machine, and K-nearest neighbor. After the dataset was trained on these classiﬁers, it was then tested using different evaluation parameters, such as precision recall, area under the curve, logarithmic loss, and F-measure. In the end, all results of all the evaluation parameters were tested through the voting classiﬁer to predict the aggregate instead of nominating a single model. The workﬂow of the proposed study can be seen in Figure 3.


Introduction
Machine learning applications are widely used for forecasting to improve decisionmaking processes in different fields, e.g., medical treatment and self-driving systems [1]. In addition, machine, earning algorithms play an important role in natural language processing, robotics, image, video, and voice processing. Machine learning algorithms follow a uniform procedure that is quite opposite to traditional programming language based on conditional statements [2]. Statistical approaches and ML methods are similar, as both typically aim to forecast accuracy. However, ML methods are more demanding because they must be implemented using computer science [3]. This research focuses on the prediction of the COVID-19 vaccine's impact. Several other ML models have been used predict the diseases and their cures, such as heart attacks [4], diabetes [4], and cancer [5]. In the study conducted by the authors of [6], a live COVID-19 prediction about the confirmed Data 2021, 6, 112 2 of 18 cases and number of deaths in particular areas was made. The authors of [7] also focused on the COVID-19 outbreak and the forecasting of deaths and number of recoveries. Hence, the best solution must anticipate the problems of various natural factors in this regard. The trend of vaccination in different countries can be seen in Figure 1.
a uniform procedure that is quite opposite to traditional programming language based on conditional statements [2]. Statistical approaches and ML methods are similar, as both typically aim to forecast accuracy. However, ML methods are more demanding because they must be implemented using computer science [3]. This research focuses on the prediction of the COVID-19 vaccine's impact. Several other ML models have been used predict the diseases and their cures, such as heart attacks [4], diabetes [4], and cancer [5]. In the study conducted by the authors of [6], a live COVID-19 prediction about the confirmed cases and number of deaths in particular areas was made. The authors of [7] also focused on the COVID-19 outbreak and the forecasting of deaths and number of recoveries. Hence, the best solution must anticipate the problems of various natural factors in this regard. The trend of vaccination in different countries can be seen in Figure 1. This study aims to predict the perception of people toward the COVID-19 vaccine. Although different governments have attempted to stop the impact of the COVID-19 using different methods [7], COVID-19 has affected countries worldwide since the end of 2019. Since the end of 2019, COVID-19 has been a very serious threat to human lives. Individuals are infected by this disease daily in huge numbers, which may lead to death [8]. The authors of [9] briefly discussed the cure and treatment of COVID-19, as well as precautionary measures to avoid this novel disease. Thousands of people have been infected and hundreds of people have died due to this novel disease. The COVID-19 vaccine introduces a solution to mitigate the risk of COVID-19 infection. Although, in the beginning, everyone experienced some serious concern about this vaccine, there is no question that the coronavirus is out of the premises of proper treatment. However, the public has heeded to the rumors instead of perceiving the vaccine as a proper medicine. It is natural to react with hesitance towards a newborn vaccine. We would have to demolish all such baseless notions against the inoculation of this medicine to prevent this reaction. Human health and safety are considered top priorities among everyone. Therefore, within a year of the introduction of the COVID-19 pandemic, different research teams accepted the challenge and developed the vaccine to protect against SARS-CoV-2, which causes COVID-19 [10]. Since of the middle of 2020, many COVID-19 vaccines have been introduced. However, the study conducted by the authors of [11] showed that people are hesitant resistant to being vaccinated with a safe vaccine. To reduce deaths related to COVID-19, almost every country across the globe has initiated protocols to ensure the availability of vaccine. However, not everybody is willing to receive the vaccine. This may be related to the number of deaths and infections due to vaccines when the vaccine was introduced. This study aims to predict the perception of people toward the COVID-19 vaccine. Although different governments have attempted to stop the impact of the COVID-19 using different methods [7], COVID-19 has affected countries worldwide since the end of 2019. Since the end of 2019, COVID-19 has been a very serious threat to human lives. Individuals are infected by this disease daily in huge numbers, which may lead to death [8]. The authors of [9] briefly discussed the cure and treatment of COVID-19, as well as precautionary measures to avoid this novel disease. Thousands of people have been infected and hundreds of people have died due to this novel disease. The COVID-19 vaccine introduces a solution to mitigate the risk of COVID-19 infection. Although, in the beginning, everyone experienced some serious concern about this vaccine, there is no question that the coronavirus is out of the premises of proper treatment. However, the public has heeded to the rumors instead of perceiving the vaccine as a proper medicine. It is natural to react with hesitance towards a newborn vaccine. We would have to demolish all such baseless notions against the inoculation of this medicine to prevent this reaction. Human health and safety are considered top priorities among everyone. Therefore, within a year of the introduction of the COVID-19 pandemic, different research teams accepted the challenge and developed the vaccine to protect against SARS-CoV-2, which causes COVID-19 [10]. Since of the middle of 2020, many COVID-19 vaccines have been introduced. However, the study conducted by the authors of [11] showed that people are hesitant resistant to being vaccinated with a safe vaccine. To reduce deaths related to COVID-19, almost every country across the globe has initiated protocols to ensure the availability of vaccine. However, not everybody is willing to receive the vaccine. This may be related to the number of deaths and infections due to vaccines when the vaccine was introduced.
Machine learning algorithms have always been helpful to predict various diseases, e.g., COVID-19, cardio-vascular disease, and many more [12]. However, analysis surrounding the presence of COVID-19 and its vaccination trend has long been questioned. Different supervised learning algorithms have also been used in different prospects [13][14][15]. To address the current human health problems due to COVID-19, this study aims to predict the tendency of the people to be vaccinated by the vaccine so that a system can be developed to convince people to get vaccinated. The prediction model has been made for the following important models for upcoming days:

1.
To identify the trend of people towards the vaccination; 2.
To find out the accuracy of the vaccinations; The prediction problem is considered a regression problem, so this research focuses on different regression models use in ML, e.g., decision tree (DT), random forest (RF), a support vector machine (SVM), K-nearest neighbor (KNN), and artificial neural network (ANN). These ML models were trained using feedback tweets of vaccinated and nonvaccinated people. The dataset is obtained from Kaggle (San Francisco, CA, USA) which contains 60,303 tweets from tweeter users who have tweeted about COVID-vaccination all over the globe from December 2020 to April 2021. The unprocessed dataset has been processed before training and testing and is divided into two subparts: training sets (80% tweets) and testing sets (20% tweets). Four evaluation parameters (logarithm loss, accuracy score, F1 score, and area under the curve) have been used in this study to find the accuracy from the test dataset.
In the result of this study, besides predicting the trend towards vaccination, some other central findings were realized:

1.
Different models give different forecasting results with the same dataset; 2.
These forecast results can be helpful in future decision-making; 3.
Date-wise datasets show poor accuracy than random dataset.
This paper consists of six sections. Section 1 consists of the introduction. In Section 2, the problem statement is defined. Section 3 describes the dataset and contains the description of models and testing matrix. The methodology is then discussed in Section 4. Discussion and results has been discussed in Section 5 while conclusions has been discussed precisely at last in Section 6.

Problem Statement
Since the COVID-19 has encompassed the world, there is always a real-time threat to life for everyone. Anyone can be infected with SARS-CoV-2 from anywhere that may cause a real problem for health or even lead to death. At the same time, vaccination played an important role in the protection from this novel disease. As this vaccine is new and may cause different health issues, so people are worry to get vaccinated in early stages. This study aims to develop a system to predict the trend of people who are getting COVID-19 vaccination.

Dataset
This study aims to predict the trend of people towards the COVID-19 vaccination. For this purpose, the dataset used in this study was based on the feedback of users given on Twitter, and the number of total vaccinations with the total number of recoveries is obtained from Kaggle [16,17]. Thus, the dataset contains the tweets of only Twitter users, without any age barrier, who have tweeted about the vaccination. A comma-separated values (CSV) file has been used to contain feedback tweets, dates of tweets, hashtags, and locations. One CSV file contains the data of vaccination by different countries and number of recoveries. Table 1 shows the tweets and the date of tweets which may be positive, negative, or neutral. The first field of contains the date of the tweets, while the second field contains the tweets which has been used to analyze the sentiment and train the data. Sentiment analysis was carried out to find the sentiment of the tweets. Table 2 contains number of people who has been vaccinated in different countries from December 2020 to April 2021. In Tables 1 and 2, sample dataset is shown, i.e., data that were preprocessed using different python libraries.

Supervised Machine Learning Models
There are a different number of supervised machine learning models used for prediction from the given input with an unknown dataset. Normally labeled datasets are used to train a model or a classifier to predict the output in both regression and classification models [18]. Classifiers might use this for classification in machine learning; hence, simple classification is useful in the learning of embattled data which maps each predefined label y with the attribute x.
In this study, five supervised machine learning models were used to train the dataset and predict the output by using testing dataset:
Artificial neural network.

Random Forest
A random forest classifier is used to solve classification and regression problems. Multiple decision trees have been used as a base classifier for the random forest [19].
The basic idea behind the RF is that it is both powerful and simple. In data science language, the decision trees in RF point out the number of different uncorrelated trees working as an assembly which will outperform any of the individual integral models.
So, as there are multiple decision trees in the RF, there might some decision trees that predict the true output, some might predict the false output, but altogether every tree will predict the single output. The reason behind the implementation of RF is very simple; it takes very little time to obtain the data train, it predicts output with high accuracy, and it predicts the accuracy when a large set of data is missing or not in true sequence [20].
Thus, multiple decision trees are generated by a random forest classifier and can work in two ways:

1.
A random sampling of data for bootstrap sampling; 2.
Generate N number of individual decision trees based on a random input selection.
As discussed earlier, RF can be used in both types of problem(s), i.e., either regression or classification. Over-fitting problems can be overcome with RF which leads to an increase in the accuracy of these problem(s).

Support Vector Machine (SVM)
Like RF, SVM is also a supervised machine learning model which is used for both regression and classification problems [21]. To solve the problem in SVM, the idea is pretty simple. A line is drawn in SVM which separates the classes in it that line is also known as hyper line. Besides separating the boundary into classes, the purpose of this hyper lane is to find out the closest point in between both classes. This closest point is called the support vector.
Two types of SVM classifiers are used: (1) linear SVM; and (2) non-linear SVM. Any of them can be used to solve a particular problem. If the data are linearly arranged, then it can be separated with the help of the hyper lane, but the non-linear data cannot be separated with the help of the hyper line. We have 2d space in the linear dataset but not for the nonlinear dataset, we have to add on more dimensions to arrange it into classes.
Kernel Trick is used in SVM to transform the given data and then to find the optimal boundary between the expected outcome with the help of transformation performed earlier. SVM is considered as a memory-efficient model and is supposed to work much more effectively when there are high dimensional spaces.

Decision Tree
A decision tree is a machine learning model used to predict the output with the help of input provided. Decision trees are much simpler and easier to understand compared to other classifiers. They deal with nodes and sub-nodes or leaf nodes while predicting an output; the external nodes of the decision tree represent the attributes and the sub-nodes or leaf nodes represent the class. Whereas, the process of the prediction in the decision tree starts from the root node of the tree. Usually, the decision tree follows the divide and conquer technique to predict the outcome(s).
Somehow, decision trees work like the thinking ability of the human brain. The flow of the decision tree is simple as there are several different components of the DT, including nodes, branches, splitting stopping, and pruning. The branches and nodes are the most important parts of any DT model, while the rest of the three make up important components in building a decision tree.
There are three types of nodes: (1) the root node which is also known as the decision tree; (2) the internal nodes which are also known as the chance node; and (3) the leaf nodes which contain the possible outcome of the event. The second component are branches, i.e., the flow of any decision tree, depends upon the branches. Normally, branches follow the if-then-else rule in decision making, whereby every path from the root to the internal or leaf nodes represents classification. The third component in DT involves splitting which occurs when there is an input variable which causes the nodes to split into one or more than one leaf or internal nodes according to the input of the variable. The process of splitting continuous until the final criteria does not meet. At the same time, the different complexities of these two components (stopping and pruning) are avoided [22].

K-Nearest Neighbor (KNN)
Unlike other classifiers, KNN is also used in a different field with different robotics features and required high efficiency [23]. Although it is considered as a good classifier to find out the accuracy, there are few major drawbacks of this classifier: (1) it is a slow model to train the dataset; and (2) as it is slow to train, there may be a dependency that exists to find out the best result [24]. KNN is considered an attractive classifier for recommendation systems and can also be useful for detecting any type of fraud detection.
As it is a slow learner classifier, it does not learn from the training set. The working of KNN is simple as it compares the input from the available classes and puts the new input into similar classes. It maintains the dataset and predicts the output at the time of classification. As it uses both regression and classification problems, its implementation is much easier than the other classifiers. In this study, it has been discussed that the DT classifier does not have good output when there is a large set of data, but KNN can be good when there is a large set of data to be trained and to be tested.
It works simply, for instance, when there are two classes, i.e., class A and class B. Then, the number of neighbors will be identified, and, with the help of Euclidean distance, the number of nearest neighbors in both classes will be identified. After calculating the Euclidian distance, Class A has four neighbors and Class B has two neighbors. When the data is tested, a new data point should belong to class A.

Artificial Neural Network (ANN)
Artificial neural network (ANN) and neural networks (NN) are considered interchangeable terms. It has also been considered as branch of artificial intelligence. An ANN is the combination of different nodes, called neurons, which are connected. These nodes are known as processing units which are the combination of outputs and input units. As a result of the implementation of ANN the results might be accurate and demanding. An advantage of ANN is that when it has been trained on a dataset, it can be tested on a new dataset [25,26]. The ANN model has been trained with the input where it has been taught with the pattern.
Another advantage of ANN over other classifiers is it may provide results without complete data once it was trained. It also has the ability to do multiple jobs at once. ANN works like a human brain as it consists of a model of neurons. In ANN, the process is carried out in the form of signals that have been sent in the form of electrical and chemical signals. Although ANN is considered a great algorithm in data science. ANN consists of three layers: (1) the input layer; (2) the hidden layer; (3) the output layer.
Usually, the input layer is used to obtains input values; it receives a single value as an input and replicates as many and sends it to the hidden layer. The number of the hidden layer may vary as it can be one or more than one. After processing the input, the hidden layer sends the value to the output layer. The output layer may also receive the value from the input layer as it receives the value from the output layer. The value that the output layer receives is the forecasting value of input variables. The application of ANN is in the stock market, speech and voice recognition, and many others.

Testing Matrix
Testing a model involves checking how accurate and effective it is. In this study, each trained model was evaluated and tested on different parameters in terms of precision, recall, F-measure, logarithmic loss (LL), and area under the curve (AUC). At the end of this study, all the results of different classifiers were used in the voting classifier to predict accuracy.

Precision, Recall, and F-Measure
Precision and recall are the two most common and misunderstood evaluations of machine learning. Both of them may rectify the imbalance classification problems in an information retrieval scheme, usually with text. In this study, they were both used to evaluate the trained model to classify the trend in the information retrieval system. Precision is measured as a tendency towards accuracy and discusses the closeness of more than one quantity, regardless of whether these quantities are true or not [27]. The precision is obtained by true positives (TP) divided by the combination of both true positives (TP) and false positives (FP).
In this study, the trained model was also tested using a recall parameter that is helpful to find out the total number of the relative redundant in a dataset. So, in recall, the scenario is a bit different than precision. Recall can be defined as the total number of true positives divided by the total number of rudiments belonging to the positive class [28]. The positive class contains the total number of false negatives (FNs) and true positives (TPs).
To acquire a balance between precision and recall, the F-measure is used to find out the accuracy of the model on trained dataset. The F-measure is also known as the F-score or the F1 score. The F1 score is the harmonic mean (HM) of recall and precision [29]. The highest possible value in the F score is 1 and the lowest possible value of the F score is 0. Usually, the F-measure might be used to balance the equation of recall and precision. The equation of the F1 score is given below.

Logarithmic Loss (LL)
In this study, logarithmic loss was calculated to test the accuracy of the trained model. Logarithmic loss, or log loss, is considered a useful evaluation metric to find out the accuracy of the classification problem. Log loss indicates the closeness of the actual value with the value of prediction probability. If the change in the value of prediction from actual value is more diverse, the log loss value will be the maximum. The binary log loss value is 0 or 1; however, in this case, the minimum value should be [30]. The log loss is calculated with the help of Equation (4) (given below).
In Equation (4), y denotes true values, i denotes the given observations, and p is the predicted probability where ln is the natural log of the number. The resultant value from the equation will be between 0 and 1. The predicted value nearer to 0 will be considered more accurate. An AUC classifier is used to differentiate between multiple classes. To find out the multiple cases of classification, an AUC-ROC has been used. Receiver operation characteristic (ROC) indicates probabilistic curves, while AUC can be any curve that may characterize the degree [31]. The simple background of ROC-AUC suggests that the AUC would be higher and there would be stronger chances of better results predicted by the model. The ROC-AUC curve (Figure 2) is plotted with a false positive rate (FPR) and a true positive rate (TPR), while the TPR is on the x-axis and FPR is on the y-axis [32].
In Equation (4), y denotes true values, I denotes the given observations, and p is the predicted probability where ln is the natural log of the number. The resultant value from the equation will be between 0 and 1. The predicted value nearer to 0 will be considered more accurate.

Area under the Curve (AUC)
An AUC classifier is used to differentiate between multiple classes. To find out the multiple cases of classification, an AUC-ROC has been used. Receiver operation characteristic (ROC) indicates probabilistic curves, while AUC can be any curve that may characterize the degree [31]. The simple background of ROC-AUC suggests that the AUC would be higher and there would be stronger chances of better results predicted by the model. The ROC-AUC curve (Figure 2) is plotted with a false positive rate (FPR) and a true positive rate (TPR), while the TPR is on the x-axis and FPR is on the y-axis [32]. TPR also known as recall (Equation (2)), while FPR is calculated as a total number of false positives (FPs) divided by the sum of true negatives (TNs) and false positives.

FPR = (5)
The AUC curve always lies between 0 to 1. If the AUC value is 0 or near to 0, it means that the AUC has the worst measure of divisibility. For example, if the value is 1 or near to 1, the AUC has the best measure of divisibility, while if the value is 0.5, it means that the classifier is not able to split the measure into different classes. The best scenario to differentiate the classes in AUC is when there is no overlap in classes. In Figure 2, ROC-AUC curves show the positive intent.

Voting Classifier
A voting classifier can predict the aggregate value of all the classifier's results instead of nominating a single model or classifier. In other words, the finding or results of all the classifiers are passed into this voting classifier, which gives a single resultant value [33]. To build and implement a smart voting classifier, features of the python module SCIKIT learn were used. TPR also known as recall (Equation (2)), while FPR is calculated as a total number of false positives (FPs) divided by the sum of true negatives (TNs) and false positives.
The AUC curve always lies between 0 to 1. If the AUC value is 0 or near to 0, it means that the AUC has the worst measure of divisibility. For example, if the value is 1 or near to 1, the AUC has the best measure of divisibility, while if the value is 0.5, it means that the classifier is not able to split the measure into different classes. The best scenario to differentiate the classes in AUC is when there is no overlap in classes. In Figure 2, ROC-AUC curves show the positive intent.

Voting Classifier
A voting classifier can predict the aggregate value of all the classifier's results instead of nominating a single model or classifier. In other words, the finding or results of all the classifiers are passed into this voting classifier, which gives a single resultant value [33]. To build and implement a smart voting classifier, features of the python module SCIKIT learn were used.

Methodology
This study was carried out to predict the future trend of the novel coronavirus (also known as COVID-19) vaccination. At a time, COVID-19 became a threat to human lives so the invention of COVID-19 vaccination was no more than a blessing for humans, but there are some myths about the vaccination which forces people to avoid it. This study aims to provide a systemize tool that would predict the trend of the people towards the vaccination. In this study, different ML models were used to predict future trends. The dataset used in this study was fetched from Kaggle in a comma-separated values (CSV) format which contains feedbacks regarding COVID-19 vaccination. Preprocessing was achieved using python programming language. The processed form of the dataset was then divided into two parts: the testing set (20% tweets) and the training set (80% tweets). Furthermore, the data were divided into two categories: the date-wise dataset and the random dataset.
The date-wise dataset follows the sequence from a specific date until the last date. However, the random dataset was chosen randomly, which does not follow any specific date. For experimental purposes, both datasets trained 20% of tweets while 80% of the dataset was tested on a trained model. The training set was trained on ML classifiers, e.g., decision tree, random forest, a support vector machine, and K-nearest neighbor. After the dataset was trained on these classifiers, it was then tested using different evaluation parameters, such as precision recall, area under the curve, logarithmic loss, and F-measure. In the end, all results of all the evaluation parameters were tested through the voting classifier to predict the aggregate instead of nominating a single model. The workflow of the proposed study can be seen in Figure 3.
there are some myths about the vaccination which forces people to avoid it. Th aims to provide a systemize tool that would predict the trend of the people towa vaccination.
In this study, different ML models were used to predict future trends. The used in this study was fetched from Kaggle in a comma-separated values (CSV) which contains feedbacks regarding COVID-19 vaccination. Preprocessing was a using python programming language. The processed form of the dataset was then into two parts: the testing set (20% tweets) and the training set (80% tweets). Furth the data were divided into two categories: the date-wise dataset and the random The date-wise dataset follows the sequence from a specific date until the la However, the random dataset was chosen randomly, which does not follow any date. For experimental purposes, both datasets trained 20% of tweets while 80% of taset was tested on a trained model. The training set was trained on ML classifiers cision tree, random forest, a support vector machine, and K-nearest neighbor. After taset was trained on these classifiers, it was then tested using different evaluation p ters, such as precision recall, area under the curve, logarithmic loss, and F-measur end, all results of all the evaluation parameters were tested through the voting clas predict the aggregate instead of nominating a single model. The workflow of the p study can be seen in Figure 3.

Performance and Results
This study aims to classify the trends of the public towards the COVID-19 vacc It is important to understand that too many people have been vaccinated [34]. Some have suffered from side effects, but the majority of the vaccinated people saw the impact of the vaccination. Different studies have also been proposed to predict the C 19 vaccination trend with the help of artificial intelligence; semi-supervised machin ing; data mining; and data science [35] with different tools, e.g., WEKA [36]. After

Performance and Results
This study aims to classify the trends of the public towards the COVID-19 vaccination. It is important to understand that too many people have been vaccinated [34]. Some of them have suffered from side effects, but the majority of the vaccinated people saw the positive impact of the vaccination. Different studies have also been proposed to predict the COVID-19 vaccination trend with the help of artificial intelligence; semi-supervised machine learning; data mining; and data science [35] with different tools, e.g., WEKA [36]. After several experiments, the results of this study show that the vaccination may cause some side effects in the human body, but the chances of side effect is extremely bleak an in rare scenario. Public brains have been encompassed with different thoughts regarding the vaccination. In this study, the classification of trends towards the COVID-19 vaccine has been predicted. For this purpose, the data were trained on five different classifiers to predict the trend of the data tested by precision, recall, F1 measure, logarithmic loss, and area under the curve. In the end, the voting classifier was used to combine the results of all classifiers to give a cumulative result in terms of accuracy.

Precision, Recall, and F1 Score Prediction
In this section, the dataset was tested on precision, recall, and F1 measure (F1 Score). While testing the trained model on a random dataset, the results of SVM show around 90.21% precision, which is the best among all; however, ANN shows the worst precision of 45.5%. All of them (precision, recall, F1 measure) performed equally and predicted the same results, as shown in Figures 4-6, respectively. The result of all the testing matrix are the same because scikit-learn metrics were used in this study for micro-averaging.
Public brains have been encompassed with different thoughts regarding t this study, the classification of trends towards the COVID-19 vaccine ha For this purpose, the data were trained on five different classifiers to predi data tested by precision, recall, F1 measure, logarithmic loss, and area un the end, the voting classifier was used to combine the results of all classi mulative result in terms of accuracy.

Precision, Recall, and F1 Score Prediction
In this section, the dataset was tested on precision, recall, and F1 me While testing the trained model on a random dataset, the results of SV 90.21% precision, which is the best among all; however, ANN shows th of 45.5%. All of them (precision, recall, F1 measure) performed equally a same results, as shown in Figures 4-6, respectively. The result of all the t the same because scikit-learn metrics were used in this study for micro-a   Public brains have been encompassed with different thoughts regarding t this study, the classification of trends towards the COVID-19 vaccine ha For this purpose, the data were trained on five different classifiers to predi data tested by precision, recall, F1 measure, logarithmic loss, and area un the end, the voting classifier was used to combine the results of all classi mulative result in terms of accuracy.

Precision, Recall, and F1 Score Prediction
In this section, the dataset was tested on precision, recall, and F1 me While testing the trained model on a random dataset, the results of SV 90.21% precision, which is the best among all; however, ANN shows th of 45.5%. All of them (precision, recall, F1 measure) performed equally a same results, as shown in Figures 4-6, respectively. The result of all the t the same because scikit-learn metrics were used in this study for micro-a     While using a date-wise dataset in this study (80% data from beginning dates and 20% data from end dates), precision, recall, and F1 score perform equally well on all the trained models. However, the results obtained from the artificial neural network (ANN) are the best, while the random tree gave the worst results, although the difference is very slight. Results are shown in Figures 7-9. While using a date-wise dataset in this study (80% data from beginning da 20% data from end dates), precision, recall, and F1 score perform equally well o trained models. However, the results obtained from the artificial neural network are the best, while the random tree gave the worst results, although the difference slight. Results are shown in Figures 7-9.   While using a date-wise dataset in this study (80% data from beginning d 20% data from end dates), precision, recall, and F1 score perform equally well o trained models. However, the results obtained from the artificial neural networ are the best, while the random tree gave the worst results, although the differenc slight. Results are shown in Figures 7-9.

Prediction Using Logarithmic Loss
In this section, predictions were made using the logarithmic loss pa testing trained module. Both forms of data, e.g., the random dataset and dataset, were used. These results are shown in Figures 10 and 11, respectiv

Prediction Using Logarithmic Loss
In this section, predictions were made using the logarithmic loss parameter on the testing trained module. Both forms of data, e.g., the random dataset and the date-wise dataset, were used. These results are shown in Figures 10 and 11, respectively. ata 2021, 6, x FOR PEER REVIEW Figure 10. Logarithm loss comparison of different trained models using the random Figure 11. Logarithm loss comparison of different trained models using the date-w While using a date-wise dataset, the decision tree has the worst pre 4.97% and, according to random dataset, the results had also quite simila tree showed the 18.09% loss. While in random dataset KNN has also 17.9 While using a date-wise dataset, the decision tree has the worst pred 4.97% and, according to random dataset, the results had also quite similar tree showed the 18.09% loss. While in random dataset KNN has also 17.9 loss. At the same time, ANN showed minimal accuracy in both datasets, wh While using a date-wise dataset, the decision tree has the worst prediction loss of 4.97% and, according to random dataset, the results had also quite similar and decision tree showed the 18.09% loss. While in random dataset KNN has also 17.99% prediction loss. At the same time, ANN showed minimal accuracy in both datasets, which is approximately zero. Hence, in terms of logarithmic loss, ANN has the minimum loss because the interpretation of logarithmic loss is minimal. Hence, lesser the loss will show the better results, while a bigger loss will show bad results. Whereas, the accuracy of the date-wise dataset in the logarithmic loss testing parameter shows the best results collectively.

Prediction Using Area under the Curve ROC-AUC
In this section of ROC-AUC prediction, the results show that random forest has the best outcome on a random dataset of 94.08%, while KNN shows the worst trend of 66.11%. Result comparison can be seen in Figure 12.  The ROC-AUC curve of the random dataset is shown in Figure 14, ROC-AUC curve of the time series dataset is shown in Figure 15. While using a date-wise dataset of feedbacks, the decision tree had the best result, although there was not much deviation among the decision tree, random tree, and SVM classifiers. The ANN shows the worst tendency of 50.00% accuracy in Figure 12 with a random dataset, while Figure 13 KNN shows the worst tendency of 48.93% accuracy with the date-wise dataset. Like the logarithmic loss, the ROC-AUC accuracy of the random dataset shows better results instead of the date-wise dataset. The ROC-AUC curve of the random dataset is shown in Figure 14, ROC-AUC curve of the time series dataset is shown in Figure 15. The ROC-AUC curve of the random dataset is shown in Figure 14, while the ROC-AUC curve of the time series dataset is shown in Figure 15.
However, the results of the KNN classifier are not good enough for the time-series dataset when tested under ROC-AUC. Hence, implementation of the voting classifier on the result of all the evaluation parameters (precision, recall, accuracy, and F1 score) shows that the result accuracy with a random dataset is 89.9%, while the result accuracy of the date-wise dataset is 45.7%. So, the pictorial representation of the dataset that has been shown in Figure 16 shows that the number of vaccinations has increased, which is also directly proportional to the implementation of the random dataset. Hence, it can be said that this proposed system has quite good outcomes with these classifiers and workflow. Figure 16 shows the presentation of Table 2, where it can be seen that the people are getting vaccinated rapidly in different countries, and where the intention towards the vaccine is increasing day by day. The trend in Figure 16 shows that the results of this study are pretty much aligned with the current number of the vaccinated people.  The ROC-AUC curve of the random dataset is shown in Figure 14, while the ROC-AUC curve of the time series dataset is shown in Figure 15.  However, the results of the KNN classifier are not good enough for the time dataset when tested under ROC-AUC. Hence, implementation of the voting classi the result of all the evaluation parameters (precision, recall, accuracy, and F1 score) that the result accuracy with a random dataset is 89.9%, while the result accuracy date-wise dataset is 45.7%. So, the pictorial representation of the dataset that ha shown in Figure 16 shows that the number of vaccinations has increased, which directly proportional to the implementation of the random dataset. Hence, it can b that this proposed system has quite good outcomes with these classifiers and wor  Table 2, where it can be seen that the people are getting vaccinated rapidly in different countries, and where the intention towards the vaccine is increasing day by day. The trend in Figure 16 shows that the results of this study are pretty much aligned with the current number of the vaccinated people.  Table 2.

Conclusions
COVID-19 has been spread rapidly all over the globe. In this pandemic, the COVID-19 vaccination can be considered as a safe corner to save a human life and to avoid deaths  Table 2.

Conclusions
COVID-19 has been spread rapidly all over the globe. In this pandemic, the COVID-19 vaccination can be considered as a safe corner to save a human life and to avoid deaths and infections from COVID-19 [37]. Different research has been made on the COVID-19 vaccination since the vaccination was introduced. For future prediction, different techniques may be helpful with different approaches to predict the results of this vaccination [38,39]. The goal of this study is to predict the trend of the people towards the COVID-19 vaccinations, besides different myths and side effects of vaccination on health. The used dataset contains a huge number of tweets of people which may belong to a different school of thoughts. For this purpose, different classifiers, e.g., a support vector machine, decision tree, random forest, K-nearest neighbor, and an artificial neural network, were used to train a dataset. After that, the trained module was tested to find out the precision, recall, F1 measure, logarithmic loss, and area under the curve. Then, the results of all the evaluation matrices were tested with a voting classifier to identify the aggregate of all the results and to find out the best among both datasets. As a consequence, the proposed system shows a better result on the random dataset with the voting classifier, as it produced 89.9% accuracy score. The date-wise data-set, has been showed 45.7% accuracy score. The results of this study are more promising and encouraging for the random dataset. While the overall results with the given dataset shows that there is a positive trend of people towards the vaccination, there are still people who are apprehensive towards the vaccination because of different myths and risks or any other health concerns. Moreover, this proposed system may be used with different a dataset to predict the impact on different challenging fields, e.g., employee's satisfaction, student feedback, trends towards newly introduced products, and many other areas. The authors aim to enhance this work in the future by using more efficient classifiers with effective techniques to implement on real-time prediction.  Data Availability Statement: The data used for experimentation in the study is openly available at https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets (accessed on 8 August 2021).