Prediction of the Infectious Outbreak COVID-19 and Prevalence of Anxiety: Global Evidence

Forecasting disease outbreaks in real-time using time-series data can help for the planning of public health interventions. We used a support vector machine (SVM) model using epidemiological data provided by Johns Hopkins University Centre for Systems Science and Engineering (JHU CCSE), World Health Organization (WHO), and the Centers for Disease Control and Prevention (CDC) to predict upcoming records before the WHO made an official declaration. Our study, conducted on the time series data available from 22 January till 10 March 2020, revealed that COVID-19 was spreading at an alarming rate and progressing towards a pandemic. The initial insight that confirmed COVID-19 cases were increasing was because these received the highest number of effects for our selected dataset from 22 January to 10 March 2020, i.e., 126,344 (64%). The recovered cases were 68289 (34%), and the death rate was around 2%. Moreover, we classified the tweets from 22 January to 15 April 2020 into positive and negative sentiments to identify the emotions (stress or relaxed) posted by Twitter users related to the COVID-19 pandemic. Our analysis identified that tweets mostly conveyed a negative sentiment with a high frequency of words for #coronavirus and #lockdown amid COVID-19. However, these anxiety tweets are an alarm for healthcare authorities to devise


Introduction
A sequence of mysterious instances of pneumonia-like illnesses began in Wuhan, Hubei Province, China, on 8 December 2019 [1,2]. When it first came out, it was referred to as SARS-CoV-2, but it has since been changed to reflect its new name, the new coronavirus (COVID-19) responsible for infectious coronavirus disease-19 [3][4][5]. In terms of genetic relationships, it is related to the severe acute respiratory syndrome (SARS) virus and the Middle East respiratory syndrome (MERS) [6,7]. The infection has spread to over 200 countries [8] public response against it. Therefore, we used machine learning to predict the spread and sentiment analysis to capture people's sentiments related to COVID-19 pandemic lockdowns to help higher authorities, psychiatrists, and policymakers for predictions and timely actions by looking into people's sentiments. Several machine learning algorithms can be utilized for predictive analysis, such as SVM, NB, -K-NN, DT, and DL [30,31]. However, we used a support vector machine (SVM), which is recently used in the literature for multi-classification [32][33][34].
The following are the most significant contributions drawn from this research: i.
In connection with the COVID-19 epidemic, a variety of research projects were carried out. However, limited research has paid attention to the prediction of a pandemic. Therefore, in this paper, we used predictive analysis to determine the trend of the growing epidemic curve. ii.
This study conducted a sentiment analysis using Twitter in addition to EDA, which was a previously unexplored area in terms of lockdown circumstances. While EDA explores the hidden functionality of the data, sentiment analysis assists to comprehend the emotional state of the behavioral pattern. iii.
Additionally, this study validated the usefulness of social media data for examining health-related communications, as well as for determining the emotional condition of the public during a healthcare crisis.
The remainder of the paper is structured as follows: Section 2 follows with related works and then is followed by Methods in Section 3. Section 4 emphasizes the experimental outcomes. Section 5 summarizes the key findings, while Section 6 considers implications and potential limitations.

Related Works
EDA is a technique that takes the supplied dataset and analyzes it to obtain information that might be useful. The method uses a visual representation of the facts to assist with betting comprehension and to improve decision-making. We believe that visualization is an excellent tool for detecting patterns, predicting future behavior, and recognizing interconnectedness. The primary goal of performing an EDA is to assist healthcare professionals in determining the pattern of COVID-19 inflation rate variation. EDA mostly assists in the analysis and formation of a comprehensive and wide understanding of the dataset as well as the potential that arises when specific conditions are met. EDA and data processing produces the most critical characteristics that aid in the selection of the most qualified candidates for forming a predictive system [35].
As the COVID-19 epidemic is quickly spreading throughout the globe, we need to determine the number of unique cases that have been affected by COVID-19, as well as examine epidemiological data through the EDA [36] approach. Because of the fast spread of COVID-19, we considered a complete daily data set and determined that it is necessary to combine case review with epidemic analysis to better understand the epidemiological features and seriousness of the illness. In addition to assembling a comprehensive dataset, this investigation examines the various trends in the number of reported instances of COVID-19. The results of this research also raise questions about whether COVID-19 is on the brink of a fast outbreak in the coming days or if there is potential to flatten the disease's distribution curve.
In addition, the COVID-19 epidemic has had a negative impact on the health and well-being of the general public throughout the globe. For example, in [37] it was shown that hospitalized patients worried about their safety, loneliness, exhaustion, and resentment during the COVID-19 epidemic. They felt nervous as a consequence of fever and insomniarelated symptoms. People directly affected by the illness, as well as the masses, experience anxiety as a result of high levels of participation and danger of mortality. More patients will suffer both physical and psychological issues due to the increased number of infected individuals and their deaths. Moreover, some researchers have examined the effects of COVID-19 on students' psychological well-being, while others have emphasized the risks to students even during the lockdown. Hasan and Bao [38] indicated that the leading cause of students' psychological distress during COVID-19 is their fear of academic year loss.
COVID-19 lockdowns have restricted movement, and people have canceled their travel plans, which has impacted the transportation sector as well [39,40]. Recent research has examined the inequalities in learning resulting from digital discrimination during COVID-19, and it showed that students are receiving unbalanced learning chances because of it and fall into mental distress [41][42][43][44]. When schools are not open, and children are being told to stay indoors, their mental health may be negatively impacted in many ways. To fully comprehend the psychological ramifications of a pandemic of this scale, mental health research must be performed. This study attempts to investigate the relationship between lockdown and social media use, measure the level of health information sharing, and include more data to support future research.
Social media has permeated every aspect of our lives, and because of this, it is an excellent platform for individuals to express their concerns by crossing geographical borders [45]. Social media analytics may be used by public health experts and government officials for a variety of purposes. The functionality allows its users to understand and experience the emotional transitions of the population while delivering dynamic and up-to-date information on public knowledge and understanding and reaction to crises, helping emergency responders to handle the situation more effectively [46]. To explore how news information has been disseminated, Petersen and Gerken [47] examined social media and studied the structure of Twitter networks to see how news stories were shared. Petersen and Gerken [47] found 13 distinct themes associated with COVID-19 hashtag use in 6.9 million tweets. The most popular themes were "COVID-19 identifying", "measure taken", "geographical", and, "news and media". Singh et al. [48] have a major issue with the fact that many Twitter users worry and are fearful of the pandemic. Furthermore, the increased usage of social media data allows for the monitoring of people's health communication behavior in real-time and at no cost. For the sake of this specific context, social media platforms like Twitter are particularly pertinent, since they exhibit patterns of communication, such as using hashtags to showcase individual-generated information [49].

Epidemiological Data
We utilized a reliable open COVID-19 dataset provided by the JHU CCSE [14]. This time-series data is available in both the dashboard and Google sheet format and is daily updated. Therefore, cases reported every day are cumulative cases for the respective day in the respective country. This dataset became publicly available on 22 January 2020. We analyzed a time series of data, from 22 January 2020 to 10 March 2020. The date of 10 March 2020 was chosen as a cutoff, as we regularly monitored reliable data-sharing resources that reflected a spike in several confirmed cases. It was observed that the COVID-19 had spread to former safer zones.

Time-Series Data Analysis
We used two data analysis methods, i.e., EDA and predictive analysis using SVM. We also compared the time-series data from multiple reliable web sources, i.e., WHO, (JHU CCSE), CDC, and CAN [14][15][16]. The main aim of applying EDA was to gain insight into the data trends and obtain confidence for further analysis in conjunction with a machinelearning algorithm. Such analysis is crucial in data science, which allows us to achieve particular insights and essential statistical measures [50].

Data Preprocessing
Initially, raw data were cleaned, followed by identifying missing data and removing duplicates through manual labeling [51,52]. Specific machine learning algorithms such as SVM are used frequently for predictive analysis [53][54][55]. We divided input data into three classes in this work: confirmed COVID-19 cases, recovered cases, and deaths. Data were used for training the SVM model to predict the threat of an emerging pandemic. SVM is a binary classification tool that cannot be applied directly to multiclass problems. To classify multiclass instances, we used the oneagainst-all technique, which clusters binary classifications by class [56]. Specifically, three binary SVMs (3 is the number of classes) were constructed in this study, with each binary classifier distinguishing one level from the rest of the classes. Choosing the parameter (C) and the type of kernel function to use while running an SVM model is another critical step [57]. Parameter C was selected, and the value of 2 was achieved as the best value in this research using the leave-one-out cross-validation technique [58].
For any new point z, the class of support vector machines classifier is predicted aŝ where the sign (·) function returns −1 if its argument is negative and +1 if its argument is positive [59]. SVMs are intrinsically two-class classifiers; a multiclass problem is formed through the construction of multiclass support vector machines, where a two-class classifier is developed over a feature vector Φ( → x,y ) obtained from the pair that constitutes the class of the datum and the input characteristics.
The classifier selects the class at test time using the following expression: We used the discussed model for this study.

Twitter Data
We analyzed Twitter tweets for assessing sentiments of the Iranian and Pakistani public amid the COVID-19 lockdown. The Twitter source data was provided by GNIP, a social media analytics company acquired by Twitter in 2014; since then, GNIP has become the official Twitter data vendor. For this study, to access the Twitter API, we used the well-supported Python package named Tweepy, which offers very comprehensive Twitter API compatibility. Tweets were extracted using the hashtag #LockdownPakistan from 22 January to 15 April 2020. The hashtags #LocdownPakistan and #LockdownIran, and #coronavirus used were all about the novel coronavirus, which is one of the most often used scientific and news media terminologies. A total of 2500 tweets were considered for the analysis.

Model Building
For model building, we used the training set composed of 70% tweets of the polarity. A supervised machine learning algorithm in Python was used for this purpose. We wrote the script using the stochastic gradient descent machine learning method to validate the model.

Exploratory Data Analysis
We performed EDA on the COVID-19 dataset (22 January 2020 to 10 March 2020). The graph in Figure 1 based upon the time series data shows the cases for 22 January, February, and 10 March 2020. According to the visual representation, verified COVID-19 instances were growing, since they received the greatest number of impacts for our chosen dataset from 22 January to 10 March 2020, i.e., 126,344 (64 percent). The number of cases that Sustainability 2021, 13, 11339 6 of 16 recovered was 34%, and the mortality rate was approximately 2%. Figure 2 shows the overall cases for January, February, and March 2020.
We performed EDA on the COVID-19 dataset (22 January 2020 to 10 March 2020). The graph in Figure 1 based upon the time series data shows the cases for 22 January, February, and 10 March 2020. According to the visual representation, verified COVID-19 instances were growing, since they received the greatest number of impacts for our chosen dataset from 22 January to 10 March 2020, i.e., 126,344 (64 percent). The number of cases that recovered was 34%, and the mortality rate was approximately 2%. Figure 2 shows the overall cases for January, February, and March 2020.  In comparison with February, the virus spread rapidly. The death rate outside China also seemed to be an interesting parameter, as it seemed that the death rate was also increasing, with 1582 deaths outside China reported until 10 March 2020. However, the number of recovered cases remained higher in comparison to the mortality rate, i.e., 17,973 patients recovered outside mainland China. The value indicator started from 0 to 60,106 from a light red color to brown colors showing high confirmed cases in China (Figure 3). We performed EDA on the COVID-19 dataset (22 January 2020 to 10 March 2020). The graph in Figure 1 based upon the time series data shows the cases for 22 January, February, and 10 March 2020. According to the visual representation, verified COVID-19 instances were growing, since they received the greatest number of impacts for our chosen dataset from 22 January to 10 March 2020, i.e., 126,344 (64 percent). The number of cases that recovered was 34%, and the mortality rate was approximately 2%. Figure 2 shows the overall cases for January, February, and March 2020.  In comparison with February, the virus spread rapidly. The death rate outside China also seemed to be an interesting parameter, as it seemed that the death rate was also increasing, with 1582 deaths outside China reported until 10 March 2020. However, the number of recovered cases remained higher in comparison to the mortality rate, i.e., 17,973 patients recovered outside mainland China. The value indicator started from 0 to 60,106 from a light red color to brown colors showing high confirmed cases in China (Figure 3). Until now, COVID-19 has spread to 115 countries worldwide and 33 states/provinces in China. The total reported cases outside China were 126,258 for March (10 March 2020). In comparison with February, the virus spread rapidly. The death rate outside China also seemed to be an interesting parameter, as it seemed that the death rate was also increasing, with 1582 deaths outside China reported until 10 March 2020. However, the number of recovered cases remained higher in comparison to the mortality rate, i.e., 17,973 patients recovered outside mainland China. The value indicator started from 0 to 60,106 from a light red color to brown colors showing high confirmed cases in China (Figure 3). Sustainability 2021, 13, x FOR PEER REVIEW 7 of 16

Predictive Analysis Using Support Vector Machine (SVM)
To perform predictive analysis, we used SVM. The results revealed that the COVID-19 epidemic was fulfilling all the criteria of progressing towards a pandemic. Figure 6 is a visual representation of three variables, i.e., COVID-19 confirmed, death, and recovered cases starting from 22 January 2020. Further, results revealed that although the number of cases being reported from China was declining, the virus was still propagating. The study

Predictive Analysis Using Support Vector Machine (SVM)
To perform predictive analysis, we used SVM. The results revealed that the COVID-19 epidemic was fulfilling all the criteria of progressing towards a pandemic. Figure 6 is a visual representation of three variables, i.e., COVID-19 confirmed, death, and recovered cases starting from 22 January 2020. Further, results revealed that although the number of cases being reported from China was declining, the virus was still propagating. The study

Predictive Analysis Using Support Vector Machine (SVM)
To perform predictive analysis, we used SVM. The results revealed that the COVID-19 epidemic was fulfilling all the criteria of progressing towards a pandemic. Figure 6 is a visual representation of three variables, i.e., COVID-19 confirmed, death, and recovered cases starting from 22 January 2020. Further, results revealed that although the number of cases being reported from China was declining, the virus was still propagating. The study

Predictive Analysis Using Support Vector Machine (SVM)
To perform predictive analysis, we used SVM. The results revealed that the COVID-19 epidemic was fulfilling all the criteria of progressing towards a pandemic. Figure 6 is a visual representation of three variables, i.e., COVID-19 confirmed, death, and recovered cases starting from 22 January 2020. Further, results revealed that although the number of cases being reported from China was declining, the virus was still propagating. The study  This visualization ( Figure 6) shows the SVM-based prediction for the results, which shows COVID-19 confirmed (blue), death (red), and recovered (green) cases. It shows that the virus was transmitting rapidly and was moving towards a pandemic. The projection was carried out using time-series data up to 10 March 2020.
Classification of the selected dataset was based on global confirmed cases (22 January 2020 to 10 March 2020). There was a slow increase in viral transmission initially (from 22 January to 1 February), where it affected 27 countries. However, from the 30 of January, WHO conducted an assessment to determine whether or not the situation would be reported as an international public health emergency [60]. This emergency call alerted government officials worldwide, leading to the suspension of air travel to and from China [61][62][63]. Our result strongly supports that quarantining measures did play a crucial role in containing viral transmission. The graphs depicted a stable line in the period when quarantine measures were being practiced effectively. Results indicate that COVID-19 transmission entered a lag phase, and only 15 more nations were affected until February 26. Unfortunately, this stability achieved in the transmission rates led to a relaxation in the quarantine measures adopted globally. The results of the predictive analysis made evident that from the end of February to 10 March 2020 there was a rapid upsurge in viral transmission, and 72 countries were hit in a short span of 14 days.
The Figure 7A-C shows the confirmed cases in China, Italy and Iran. This graph represents the viral transmission pattern. It shows that when travel bans were imposed, and strict quarantine measures were implemented, there was a lag phase of transmission of COVID-19 (1-26 February). As soon as quarantine and travel bans were relaxed, the transmission exploded on an exponential trajectory. This visualization ( Figure 6) shows the SVM-based prediction for the results, which shows COVID-19 confirmed (blue), death (red), and recovered (green) cases. It shows that the virus was transmitting rapidly and was moving towards a pandemic. The projection was carried out using time-series data up to 10 March 2020.
Classification of the selected dataset was based on global confirmed cases (22 January 2020 to 10 March 2020). There was a slow increase in viral transmission initially (from 22 January to 1 February), where it affected 27 countries. However, from the 30 of January, WHO conducted an assessment to determine whether or not the situation would be reported as an international public health emergency [60]. This emergency call alerted government officials worldwide, leading to the suspension of air travel to and from China [61][62][63]. Our result strongly supports that quarantining measures did play a crucial role in containing viral transmission. The graphs depicted a stable line in the period when quarantine measures were being practiced effectively. Results indicate that COVID-19 transmission entered a lag phase, and only 15 more nations were affected until 26 February. Unfortunately, this stability achieved in the transmission rates led to a relaxation in the quarantine measures adopted globally. The results of the predictive analysis made evident that from the end of February to 10 March 2020 there was a rapid upsurge in viral transmission, and 72 countries were hit in a short span of 14 days.
The Figure 7A-C shows the confirmed cases in China, Italy and Iran. This graph represents the viral transmission pattern. It shows that when travel bans were imposed, and strict quarantine measures were implemented, there was a lag phase of transmission of COVID-19 (1-26 February). As soon as quarantine and travel bans were relaxed, the transmission exploded on an exponential trajectory.

Comparison of Visual Data from Multiple Reliable Web Sources
We reported our analysis based on data available on reliable web sources such a WHO [16] JHU CCSE [14], and CDC [15]. These web sources reported live updates tha shed light on increasing countries hit by the COVID-19. These time-series data supported our findings that the epidemic of novel COVID-19 was turning towards a pandemic When stakeholders do not adopt suitable precautions on time, the situation can get out o control. Figure 8 is adapted from WHO [16], (JHU CCSE) [14], and CDC [15], respectively All these sources showed an increase in confirmed cases and their geographical distribu tion. These sources' overviews showed 113,702 confirmed (4125 new) and 4012 deaths (20 unique) cases reported until 10 March 2020. However, the recovery rates were much higher than the death rates, which indicates that although the COVID-19 had a highe transmission rate, the associated mortality was quite low. This research showed tha China and the rest of the world's leadership used quarantine measures at the beginning of this epidemic to confine the virus, and this acted to control the epidemic in the early stages. Nevertheless, when the quarantine measures were relaxed due to extreme scru tiny, a sharp upsurge was seen in COVID-19 transmission to new locations worldwide.

Comparison of Visual Data from Multiple Reliable Web Sources
We reported our analysis based on data available on reliable web sources such as WHO [16] JHU CCSE [14], and CDC [15]. These web sources reported live updates that shed light on increasing countries hit by the COVID-19. These time-series data supported our findings that the epidemic of novel COVID-19 was turning towards a pandemic. When stakeholders do not adopt suitable precautions on time, the situation can get out of control. Figure 8 is adapted from WHO [16], (JHU CCSE) [14], and CDC [15], respectively. All these sources showed an increase in confirmed cases and their geographical distribution. These sources' overviews showed 113,702 confirmed (4125 new) and 4012 deaths (203 unique) cases reported until 10 March 2020. However, the recovery rates were much higher than the death rates, which indicates that although the COVID-19 had a higher transmission rate, the associated mortality was quite low. This research showed that China and the rest of the world's leadership used quarantine measures at the beginning of this epidemic to confine the virus, and this acted to control the epidemic in the early stages. Nevertheless, when the quarantine measures were relaxed due to extreme scrutiny, a sharp upsurge was seen in COVID-19 transmission to new locations worldwide.  [14], and (C) CDC [15].
All these sources showed an increase in confirmed cases and their geo tribution.

Results of Tweets to Predict Emotions (Anxiety and Stress)
To have a better understanding, we used the "bing" lexicon to analyze cern which terms were associated with positivity and negativity. We count in each set and came up with the top 10 most frequent positive and nega those datasets. The word cloud and graphs represent the sentiments of peop with tweets as positive and negative words. Figures 9-11 show the word cloud from overall tweets and bar charts and negative tweets used in this study. The words "family", "time", and "ho most frequently occurring words in the overall tweets. On the other hand, w "pandemic", "death", "stress", and "virus" were the most frequently occur the negative sentiments.  [14], and (C) CDC [15].
All these sources showed an increase in confirmed cases and their geographical distribution.

Results of Tweets to Predict Emotions (Anxiety and Stress)
To have a better understanding, we used the "bing" lexicon to analyze tweets to discern which terms were associated with positivity and negativity. We counted the words in each set and came up with the top 10 most frequent positive and negative words in those datasets. The word cloud and graphs represent the sentiments of people associated with tweets as positive and negative words. Figures 9-11 show the word cloud from overall tweets and bar charts from positive and negative tweets used in this study. The words "family", "time", and "home" were the most frequently occurring words in the overall tweets. On the other hand, words such as "pandemic", "death", "stress", and "virus" were the most frequently occurring words in the negative sentiments.

Discussion
The current study estimates the risk of the COVID-19 epidemic turning into a pandemic using EDA and predictive analysis algorithms on the time-series data available at the John Hopkins university web link. Based on this data, it can be seen that the estimated number of confirmed cases was supposed to increase, and the death and recovery rates may have varied in the coming days after 10 March 2020. This cutoff time, i.e., 10 March 2020, was chosen specifically as we observed a sudden spike in the number of COVID-19 cases being reported outside China after February 26. The day-to-day data reflected the fact that this epidemic was worsening, and a global pandemic was a reality.
As seen by the predictive analysis, death rate was relatively low, i.e., 2% which indicated that COVID-19 was not as severe as other coronavirus epidemics reported in the past, including SARS, and MERS, the reported case fatality rates (CFRs) of which were 17% [64][65][66] and 20% [67], respectively. However, transmission rates of COVID-19 were much higher than their counterparts, indicating the greater potential of this epidemic turning into a pandemic.
Further, it is noted that viral transmission was slow initially, and only 27 countries were affected. This was because as soon as the news of an emerging viral infection spread, the international community came up with strict measures, which included the shutting off of air transportation to and from China. Initially, it was thought that quarantining an entire nation was unprecedented, as both Chinese officials and the international community took aggressive measures [68]. The main aim of this quarantine was to restrict viral hosts and starve the virus. However, the health officials criticized the situation and said that such measures could only deprive any state or region of the medical facilities and not contain the viral spread [68]. However, as soon as the strict international protocols were relaxed, Iran emerged as the second endemic zone, and carriers from Iran spread the virus to many countries across the globe, and the lag phase of 1 February to 26 February was followed by a heightened increase in the COVID-19 transmission, when from the end of February to 10 March 2020, 72 countries were hit in a short span of 14 days. This analysis is crucial in helping to understand how important it is to quarantine an individual, a city, or a country during times of an endemic/epidemic crisis such as this one in order to prevent it from turning into a pandemic.
Another important aspect of this study is sentiment analysis, and visualization of tweets is significant for finding the true picture of public emotions against some event and or incident such as healthcare emergency [22,69]. Outcomes show that stressful negative words were being more frequently used in tweets as compared to positive words. These included "stress", "anxiety", "worry", pandemic", "crisis", "threat", and "fear". However, some of the positive words "family", "time", "home", "good", "safe" are seen. It appears that the people in both countries, Iran and Pakistan, do understand the significance of lockdown. People are aware that this aggressive measure is for their good, and the virus can be contained only by social distancing and self-isolation [22,70]. A murky sadness does dwell among the masses as seen by the words "pandemic", "disease", "crisis", "fear", "lockdown", and "phase". The masses seem to be worried about the fate of daily wage earners [71,72]. As shown in Figure 3, we selected the sentiment-laden terms and top 10 most popular positive and negative terms from our tweets and observed their use patterns to gain a better idea of how they were employed in general. Understanding people's feelings about a topic is an important aspect of classifying the words into emotions. Here, we classified the words based on the recent coronavirus epidemic.

Implications and Limitations
In times of public health emergencies, there is a need for a proactive public health campaign on social media. The healthcare authorities should monitor tweets by the masses that relate to public health emergencies; these can help devise policies and maintain the supply chains [73]. Methods should be devised to help the public health and scientific community access the core quantity of posts on social networking sites while respecting privacy at the same time. Moreover, sentiment analysis researchers should focus on multilingual sentiment analysis, as most research workers can easily understand English [74]. Furthermore, factors affecting mental health amid pandemic lockdowns should also be Sustainability 2021, 13, 11339 13 of 16 studied. The tracking of false information circulated online should also be studied, as it is the most common source affecting mental health.
Limitations include that the results illustrate death only among the COVID-19 confirmed cases. Once factual infection data become available, a true CFR can be estimated. Therefore, future research based on larger sample size and improved specificity can be conducted. However, we still believe that our analysis can help in gaining an improvised situational assessment. Looking at the current study from a technical perspective, it is emphasized that the proposed prediction based on SVM can be critically important, as it hints towards the start of a pandemic. It will help all the stakeholders and regulatory authorities to take necessary steps such as grounding air traffic; suspending classes, public gatherings, and events to contain the viral spread; and managing the epidemic before the situation worsens and turns into a pandemic. In conclusion, the present study reveals that the death rate, confirmed cases, and recovered cases all endorse the notion that this COVID-19 epidemic could turn into a pandemic situation. Moreover, heterogeneous causes of death, such as risk and age groups, also need to be considered when estimating the CFR in future studies.

Conclusions
Furthermore, as the last two decades have encountered the SARS and MERS epidemics, and the current situation developed into a COVID pandemic, we strongly believe that predictive analytics will help improve the situational assessment in the present or any future health crisis. To the best of our knowledge, this is the very first study on COVID-19 that has incorporated exploratory data analysis and predictive analysis and provided comparison based on different data sources successively. This study also offers a strong baseline to take serious preventive measures to counter any new viral outbreak. Our study provides a roadmap for future researchers to further explore the data, since this COVID-19 epidemic has now officially entered into a pandemic mode, and to analyze the key parameters that can help influence timely intervention.