Performance evaluation of regression models for COVID-19: A statistical and predictive perspective

Research is very important in the pandemic situation of COVID-19 to deliver a speedy solution to this problem. COVID-19 has presented governments, corporations and ordinary citizens around the world with technology playing an essential role to tackle the crisis. Moderate and flexible innovation arrangements that can speed up progress towards giving critical well-being ability are proved hourly. Knowledge with the aid of creativity must be obtained, accepted and analysed in a short time frame. In this example, the machine learning model has a major role to play in predicting the number of next positive COVID-19 cases to come. For government departments to take effective and strengthened future COVID-19 planning and innovation. The ongoing global pandemic of COVID-19 has been non-linear and dynamic. Due to the especially perplexing nature of the COVID-19 episode and its diversity from country to country, this study recommends machine learning as a convincing means to demonstrate flare-up. In this linear regression, polynomial regression, ridge regression, polynomial ridgeregression, support vector regression models, the COVID-19 data set from multiple on-line tools have been evaluated. During the work process comprehensive experiments were performed and each test was evaluated with the parameters mean square error (MSE), medium absolute error (MAE), root mean square error (RMSE) and R2 score. This study also offers a path for future research using regression models based on machine learning. Precise validation and data analysis can contribute to strategies for healing and disease prevention at an early stage. A systematic comprehensive strategy is a new philosophy in which statistical data for government agencies and community can be forecast.


Introduction
Coronavirus or COVID-19 originated in Wuhan, China in the month of December 2019. Till date (12 July 2021), there is no avowed human immunization for combating it. COVID-19 engendering is speedy when people are in vicinity [1]. Coronavirus would be the most unnerving situation faced by the whole world since the end of the second world war. Even more unnerving as the enemy is invisible. COVID-19 has affected our lives to a great extent. Work, economy, education and almost everything has come to stand still. People nowadays only focus on what is essential to live. Government and other organizations have had to take unprecedented steps to stem the trail of destruction. The reasons for its spread and thinking about its threat, practically all the nations have proclaimed either partial or complete lockdowns all through the influenced districts and regions. Since there is no endorsed medicine till now for slaughtering the infection so the legislatures of all nations are concentrating on the precautionary measures which can stop the spread [2]. Since there is no embraced solution till now for dispatching this contamination, theauthorities of all countries are paying attention to the careful steps that can minimize the spread. This virus causes respiratory tract infections that can range from mild to fatal. This can influence individuals of any age yet older people or individuals with previous ailments are more inclined or vulnerable to it. The most testing part of its spread is that an individual can have the infection for a long time without indicating side effects [3][4][5][6]. Utilization of mask, sanitizer, customary hand washing and cleanliness is the most ideal approach to keep from this illness. People are urged to believe Staying homes safes lives. COVID-19 will reshape our reality.
The World Health Organization (WHO) on February 11, 2020 came forward stating this virus as a pandemic outbreak naming it as COVID-19 [21] further stating that the virus had first taken place in China, making it to different countries. USA, Brazil, and India have been the most affected countries where the number of cases of this pandemic COVID-19 multiplied rapidly on the daily basis.
Machine learning plays a major role in better understanding and examining COVID-19 crisis as it identifies the patterns in data and uses them to automatically make predictions or decisions. In the medicinal services area, Machine Learning can be seen as an asset that has the extension to process colossal datasets past the capacity of human personalities and the derived bits of knowledge help doctors in arranging and giving consideration to get acceptable treatment. On the basis of this concept, this work has been carried out for analysis of COVID-19 cases and prediction of upcoming new positive cases [18]. In this model authors are going to look at the development of Coronavirus affirmed, demise and recouped instances of whole world and comparison of five countries that have been vigorously tainted done with rest of world. Investigation and predictions are done using Linear Regression, Ridge Regression, Polynomial ridge Regression [5], polynomial Regression and Support Vector Regression (SVM). The proposed expectation model guarantees that it follows the authentic outcome with respect to this pandemic circumstance so that tremendous financial misfortune, network spread, measure of social separation of individuals can be identifiers and furthermore precise choice can be taken likewise [7]. This strategy will ensure the administration to yield preventive evaluations reliant on our next work for foreseeing the presence of this infection in future.
The remainder of the paper comprises five sections. Section 2 present the Related Work, Section 3 presents the Materials and Methods and Staging Analysis Parameter used for investigation, Section 4 presents the evaluation with comparison table and Section 5 sums up the paper and presents the conclusion with references.

Related work
Since the time the rise of COVID-19 and its resulting spread across landmasses overwhelming both progressed and creating countries, there has been a great deal of exploration papers distributions on different parts of COVID-19. So, in various research paper analysis is done on vaccination, drug therapy and also on the prediction of future infected, recovered and death cases [1,2]. Using various forecasting techniques L. Jia et al., 2020 [12] did the prediction and analysis of COVID-19 through various models. The differential equation, classical differential equation, historical arrangement expectation model is utilized in the project. They also used an Internet-based infectious disease prediction model. They used various models which came under mathematical models such as the Logistic model, Bertalanffy model [12] and Gompertz model [12]. Model evaluation was done using regression coefficient (R2). Fitting and analysis of SARS and COVID-19 was done on two basis/categories: number of affirmed cases and number of passing toll. Different prediction graphs were obtained under these categories. According to this paper they con-cluded that this epidemic will be over presumably in the last of April but their estimation is not so accurate [12]. F. Rustam et al. (2020) introduced future forecasting on Covid-19 using various machine learning models. They use four regression models namely LR, LASSO, SVM and Exponential Smoothing. They predicted their results on the basis of Evaluation Parameters used in their models. According to their results ES model is best for predicting infected cases and recovered cases and LR model best predict the death rate [9].
S. Dutta et al., 2020 [20] introduced in their paper the prediction models for confirming COVID-19 cases. They used three models in their paper namely LSTM (Long Short-Term Memory) model, GRU (Gated Recurrent Unit) model and combined LSTM-GRU Model [20]. Deep Learning Neural Network is employed. The consolidated LSTM-GRU based RNN model gives a relatively better outcomes as far as forecast of affirmed, delivered, negative, demise cases on the information [20]. Similarly, Narinder et al., 2020 [19] in this paper mirrored that because the eruption of the COVID-19 has turned into pandemic, the analysis of medicine knowledge isrequired to arrange the society to combat this challenge. The four stages of COVID-19 are described in this paper. The regression is trained and cross-validated on real time knowledge victimization the quantity of confirmed, recovered, and death cases. The results were displayed through graphs that pictured COVID-19 worldwide epidemic analysis victimization of machine learning and deep learning techniques/algorithms [19].
The work carried by Sohini and Sareeta in 2020 [10] proposed a model in which they have done data visualization after that they used various forecasting machine learning techniques. During this paper a comparison is formed between the expansion of Covid-19 confirmed deaths and recovered cases of Asian countries to different major countries that have conjointly been heavily infected. The various predictions are aforethought on a bar chart. Gender distribution is mirrored through a chart that reveals males are a lot of possible to be diagnosed with Covid-19. Age wise distribution is additionally shown through pie charts. They concluded that Sigmoid model is best among all the models they used [10]. Binti Hamzah et al., [8] and his fellow mates in year 2020 made corona tracker an Online stage that provides most up-to-date what's a lot of, solid news advancement, even as insights and investigation on COVID-19. They have done continuous information inquiry and imagined on their site and at that point questioned information is utilized for SEIR modelling. On the basis of day by day observations they utilized the SEIR model to figure Covid-19 episodes inside and Outside of China. Similarly, they have done investigation on the Queried News [8].
L Wynants et al., (2020) proposed a model in which they have inspected all the papers identified with covid-19 pandemic. They recognized that proposed schemeis ineffectively detailed and at high danger of predisposition, raising worry that their expectations could be untrustworthy once applied day by day [4]. R. Sujatha et al., in 2020 used three models for their predictions and did visualization on India. Using graph plotting, they try to figure out the future trend of this epidemic. From apprehended qualities and coordination cases from dataset information they concluded that MLP strategy is giving acceptable forecast outcomes than that of the LR and VAR technique [11]. L. Peng et al., (2020) using SEIR model they predict in light of the open information of National Wellbeing Commission of China from January twentieth to February sixteenth, 2020, they dependably gauge key plague boundaries and make forecasts in the curvature period and conceivable consummation time of 24 regions in Territory and 16 regions in Hubei region [17]. Li Cuilian et al., in 2020 proposed a model to assess the forecast estimation of the Internet search information from online web indexes and web-based social networking for the COVID-19 episode in China [16]. Vinay and Lei Zhang (2020) reflected in their paper the approximate finishing purpose of this epidemic in Canada and in the world. They have developed a forecasting model using deep learning techniques. The dataset required for this purpose was collected from a University and Canadian fitness management. Long short-term memory model was displayed to forecast COVID-19 and based on the graphs depicted by LSTM they predicted that this pandemic will end by June 2020. Despite the fact that their model accomplished better execution when contrasted and other anticipating models [15].
Dong Je et al., (2020) introduced a model that clarifies the high-risk factors of coronavirus. They collected the clinical data of all the patients that were admitted to Fuyang Second People's Hospital in China between 20th January and 22nd February 2020. They divided around 208 patients under two categories-Stable group and Progressive group. Univariate and multivariate analysis depicted the autonomous high-hazard factors for COVID-19 movement. They concluded that the use of the CALL score model can be an effective resource to combat this challenge [14]. Weston C. Roda et al., (2020) summarized that after comparing SEIR and SIR models that it is not necessary that a complex model such as SEIR is more effective than a simple model SIR. They used Bayesian framework, Markov chain Monte Carlo algorithms and Akaike Information Criterion (AIC). Parameters in the SIR model and Parameters in the SEIR model were shown through tables. Comparisons were made through graphs [13]. C. Sohrabi et al., 2020 [21] tells that on 31st December 2019 few cases of coronavirus were identified in China. WHO global health emergency, global reaction, reported UK cases and British response, transmission, viral spread, preventing, diagnosing, treating, forecasting and containment methods for this virus have been addressed by the authors.The authors also compared the diagnostic criteria for the CDC against the WHO based on symptoms and travel. In the paper the authors have also described the retrospective from response to COVID-19.
R. Khan et al., 2020 [42] investigate sentiment analysis techniques for the analysis of twitter covid-19 data. They find that it has been seen by pre-handling the information utilizing the regex and coach has been a viable answer for getting out the unpredictability of the applied calculation also the information instead of straightforwardly applying on the crude information itself. By utilizing the prepared model and further utilizing it with the classifier end up being a superior route for grouping as it decreased the time period and the size of the information outline diminishing the time unpredictability associated with the cycle. Discussing the examination, it has been seen that the number for tweets shared by the dynamic clients has been consistently more prominent when contrasted with different assessments. It implies with respect to the pandemic greatest number of individuals thought and took the choices made by the administration or the nearby experts in a positive manner. While the quantity of tainted and demised individuals continued expanding it didn't influence the psychological quality of the populace. For the 3-month investigation with respect to the Indian sub-landmass the variety among the positive, negative and nonpartisan estimations stayed consistent with the quantity of expanding cases step by step.
Gozes, Ophir, et al, 2020 [43], underlying examination, which is right now being extended to a bigger populace, exhibited that quickly created AI-based picture investigation can accomplish high precision in identification of Coronavirus just as measurement and following of sickness trouble.
Shi, Feng, et al, 2020 [44] discussed that for COVID-19 pandemic clinical imaging techniques has a significant function in battling against COVID-19. In their research authors examines how AI gives sheltered, precise and proficient imaging arrangements in Coronavirus applications. The X-ray and CT scan techniques have been utilized to exhibits the adequacy of Man-made intelligence engaged clinical imaging. It is significant that imaging just gives incomplete data about patients. This is imperative to join imaging information with clinical indications and lab assessment results to support the screening, location furthermore, finding of COVID-19. For this situation, we trust AI will exhibit its capacity in combining data from this multi-source information, for performing precise and productive conclusion, investigation and development.
Yan, Carol H., et al, 2020 [45], In ambulatory people with flu like side effects, chemosensory brokenness was unequivocally connected with COVID-19 contamination and ought to be viewed as when screening manifestations. Most will recoup chemosensory work inside weeks, resembling goal of other illness related manifestations.
Sun, Yinxiaohe, et al, 2020, [46] discusses that clinical and laboratory data can be rapidly established for individuals who are exposed to COVID-19 and allow PCR testing and containment efforts to be made a priority. For prediction models, the primary laboratory test results were important.
Oliveiros, Barbara, et al, 2020 [47] discussed that COVID-19 progression rate is projected to be slower with spring and summer. However, the two variables make up at most 18% of the transition, with the other 82% connected to other factors, such as containment steps, general health policies, population density, transport, cultural matters, etc. Furthermore, the direct effect is small: if the temperature rises by 20°C for example, the average time doubling is predicted to increase by 1.8 days in the best-case scenario.
Chen, Xiaofeng, et al, 2020 [48] discussed that pneumonia patients with and without COVID-19 can be differentiated on the basis of CT imaging and clinical symptoms alone. A model consisting of semantic and clinical radiological features performs very well in diagnosing COVID-19.
Dryhurst, Sarah, et al. [49] said that the risk of COVID-19 around the worldclearly show the consistent connection between COVID-19 perceptions of risk and various experience-and socio-cultural factors across countries. At the same time, we notice that cultural differences in risk perception must be addressed.

Motivation
As we know, data science helps us to clarify and understand the data that has been accumulated so far. Data science helps one to simulate and imagine the patterns of how coronavirus spreads including the number of patients reported daily with coronavirus or possible infected cases, i.e. it is growing or more than before and so on. Data analysis helps one to gain some valuable insight into the data. Machine Learning lets us make predictions and lets us create models that map the real world and that will take some data and predict the future. So, in our model, we first try to imagine data and then forecast future data and also try to find the best fit regression model that could help us with future predictions.

Dataset preparation
In this examination paper we have utilized the overall details of coronavirus from January 22, 2020, to July12, 2021, was assembled from the online resources like Kaggle, Weka 3.8.4 and Orange [11]. The datasets give us the quantity of affirmed cases, recouped cases, and passing cases everywhere throughout the world. The datasets are accessible in time arrangement position with death, month and year so fleeting parts are not neglected.
We isolated our datasets into a preparing set (85%) on which our model readied and a testing set (15%) to test the exhibition of our framework.

Experimental work flow chart
The experimental work flow diagram of the work is shown in Fig. 1. From the Fig. 1, it has been found that the complete work divided into two sections. Section one is used for data visualization and section two is used for future data prediction. For data visualization, progress bar, recovery rate and mortality rate are shown. In Section 2, linear, polynomial, ridge, ridge polynomial and support vector machine (SVM) having polynomial kernel function is used for future case prediction. Each model is evaluated using MSE, MAE, RMSE and R2_SCORE parameters.
3.2.1. Data visualization 3.2.1.1. Worldwide visualization framework. In this work demonstrate the week after week progress of various sort of cases in world which incorporates affirmed cases, recouped cases and demise cases. From this it has been inferred that the pace of increment in affirmed cases is a lot higher than the recovered and demise rate cases. Passing rate is low in contrast with other two cases though recouped rate is moderate. At that point we have indicated the everyday increment in affirmed cases, recouped cases and demise cases. The curve given in Fig. 2 shows the week number vs number of cases line graph shows the weekly increase of different types of cases in worldwide. From our prediction confirmed cases are increasing with a high rate and recovery rate is average as compared to number of active cases. Also, death rate is less as compared to number of active cases.

Mortality and recovery rate visualization
Mortality rate is a proportion of the quantity of deaths in a specific populace, scaled to the size of that populace, per unit of time. Recovery rate is the degree to which head and accumulated enthusiasm on defaulted obligation can be recouped, communicated as a level of assumed worth. The monthly progress of different types of cases in world is shown in Fig. 3.
From the Fig. 3, it has been found that the mean and median of average mortality is reported as 3.413 and 2.622, respectively. Similarly, the mean and median of average recovery rate is evaluated as 52.20 and 56.61, respectively.

Comparative analysis
In this section a comparative analysis has been done according to mortality and recovery rate of top 15 countries. Fig. 4 shows the list of top 15 countries having high mortality rate and low mortality rate. According to this, Yemen has the highest mortality rate and Bhutan get the lowest position in the list of mortality rate country. Fig. 5 shows the top 15 countries according to high recovery rate and low recovery rate. In this figure, it is noticed that Djibouti has highest recovery rate and Sweden has lowest recovery rate among the country list.
According to statistical analysis done by the author's, the following key facts have been observed.
(a) Top

Growth factor
Growth factor is a proportion of how rapidly the quantity of new cases is rising or falling and the basic thing to recall is that we need to keep it fewer than one. Critically, one thing that has changed since the last time the development factor was over one is that the quantity of new cases every day is low. It is simply the factor by which an amount duplicates itself after some time. Fig. 6 shows the growth factor of different types of cases worldwide.

Doubling rate
Multiplying of cases is additionally a significant factor of COVID-19. Perceiving the doubling rate can make the government and individuals aware about the severity of the disease and hence enhance the preventive measures. The following table shows the doubling rate of coronavirus. Table 1 shows the doubling rate of coronavirus infected cases.
The above Figs. 7 which compares the confirmed, recovered and death cases of India, China, UK, Italy, US and rest of the world. We can infer from the graph that confirmed, recovered and death cases of the rest world is higher than the other five countries. The brown line shows the confirmed, death and recovered cases of rest of the world. It is observed that confirmed, death and recovered cases have increased exponentially with time. The above Figs. 7a and 7b which compares the confirmed, recovered and death cases of India, China, UK, Italy, US and rest of the world. We can infer from the graph that confirmed, recovered and death cases of the rest world is higher than the other five countries.  The above graph shown in Fig. 8, presents the mortality rate and recovery rate comparison of 5 countries with the rest of the world.
Mortality rate of U.K has increased in the timestamp whereas India has least mortality rate. Recovery rate of mainland China is highest and U.K has least recovery rate.
(a) Blue line represents the mortality and recovery rate of India. It is observed that the mortality rate increased from mid of April 2020 till mid of September 2020 and then it started decreasing. The recovery rate decreased from mid of March to mid of May and then started increasing gradually. (b) Orange line represents the mortality and recovery rate of Mainland China. It is observed that the mortality rate increased steeply after May 2020. The recovery rate is represented by normal yield curve. (c) Green line represents the mortality and recovery rate of Italy. It is observed that mortality rate increased after April 2020 and then decreased in mid of November 2020 and recovery rate increased afterNovember 2020in Italy. (d) Red line represents the mortality and recovery rate of US. It is observed that mortality rate increased gradually from March till mid of April 2020. (e) Purple line represents the mortality rate and recovery rate of UK. It is observed that mortality rate has decreased after November 2020 and recovery rate is constant after mid of December 2020. (f) Brown line represents the mortality rate and recovery rate of rest of the world. It is observed that mortality rate has decreased after September 2020 and recovery rate has increased after September 2020.

Regression models
In this work unsupervised machine learning algorithm is used for the prediction of future cases of COVID-19 in terms of confirm cases, recovered cases and death cases. For this study, linear regression ((LR), polynomial Regression (PR), Ridge and Lasso regression (RLR), Ridge Regression (RR) and Support vector machine (SVM) regression models are used. The brief description of each model is given here.

Linear regression (LR)
It is an immediate method to manage showing the association between a scalar response (or ward variable) and at any rate one instructive factor (or free variables). It was the essential kind of backslide examination to be thought completely, and to be used broadly in feasible applications. This is because models which depend legitimately upon their dark limits are less difficult to fit than models which are non-straightforwardly related to their limits and considering the way that the real properties of the resulting estimators are easier to choose. One of the information digging methods utilized for expectation undertakings is Linear Regression [22][23][24][25]. In an issue with one indicator, this procedure attempts to locate the best line to fit. The formula to calculate linear regression is given in Eq. (1).
where, b 0 is a constant, b 1 is the regression coefficient; x is the independent variable and b y is the predicted value of dependent variable.

Polynomial regression
In estimations, it is a kind of relapse examination wherein the association between the free factor x and the poor variable y is shown as a farthest breaking point polynomial inx. It fits a nonlinear association between the estimation of xand the relating unforeseen mean of y, implied E yjx ð Þ: Even though polynomial regression fits a nonlinear model to the data, as a quantifiable estimation issue it is immediate, as in the backslide work E yjx ð Þ is straight in the dark limits that are assessed from the data [26][27][28]. Subsequently, polynomial relapse is seen as an uncommon case of various direct backslides. It can be calculated according to Eq. (2). b y ¼ q 1 þ q 1 x 1 þ q 2 x 2 1 þ ::::: þ q n x n 1 where,b yis the predicted value of dependent variable,x 1 1 ; x 2 1 ; x 3 1 :::::x n 1 are independent variables and q 1 ; q 2 ; q 3 :::::q n are all coefficients.

Ridge and Lasso regression
For some, machine learning issues with an enormous number of highlights or a low number of perceptions, a direct model tends to overfit and variable determination is dubious. Models that utilize shrinkage, for example, Lasso and Ridge can improve the forecast precision as they diminish the estimation difference while giving an interpretable last model. Ridge and Lasso expand on the direct model; however, their basic idiosyncrasy is regularization. The objective of these techniques is to improve the misfortune work with the goal that it depends not just on the whole of the squared contrasts yet in addition on the relapse coefficients [29][30][31].
The primary thing in the advancement of such a framework is the right determination of the regularization boundary. Compared to Linear Regression, Ridge and Lasso models are progressively impervious to anomalies and the spread of information. Generally speaking, their fundamental reason for existing is to forestall overfitting. The primary distinction between Ridge relapse and Lasso is the means by which they dole out a punishment term to the coefficients.

Ridge regression
The Ridge regression is a method which is specific to multilinear regression information which is multicollinearity in nature. Ridge Regression is one of the most head regularization methodologies which isn't used by various people due to the multifaceted science behind it [31][32][33]. On the off chance that you have a general thought regarding the idea of different relapse, it's not all that hard to investigate the science behind Ridge relapse in r. Relapse is the equivalent, what makes regularization diverse is that the way how the model coefficients are resolved.The equation for this technique is given in Eq.
This sets the coefficient that can be said as min (whole of square residuals + k |slope|), where, k |slope| is punishment term.

Support vector regression
They are astounding yet versatile controlled AI figuring which are used both for course of action and backslide. SVMs have their exceptional strategy for execution when diverged from other AI estimations. As of late, they are incredibly notable because of their    ability to manage different relentless and obvious components.A SVM model is basically a depiction of different classes in a hyperplane in multidimensional space [34,35]. The hyperplane will be delivered in an iterative manner by SVM with the objective that the goof can be constrained. The goal of SVM is to parcel the datasets into classes to find the biggest immaterial hyperplane (MMH). RBF kernel: The RBF Kernel is additionally called the Gaussian part. There'sa boundless number of measurements within the feature space since it can be extended by the Taylor Arrangement.
Within the arrange underneath, the c parameter characterizes how much impact a single preparing illustration has. The larger it is, the closer other examples must be to be influenced [36]. It may be a general-purpose part; utilized when there's no earlier information around the data.Mathematically, it is represented as given in Eq. (4).
jja À a 0 jj 2 denotes the squared Euclidean distance between two feature vectors. r is a free parameter.
Polynomial Kernel: The polynomial kernel looks not as it were at the given highlights of input tests to decide their similitude [37], but too combinations of these". With n unique highlights and k degrees of polynomial, the polynomial bit yields n d extended highlights where d is the degree of polynomial. It is prevalent in image processing.Mathematically, it is represented as given in Eq. (5).
where k is the degree of polynomial Sigmoid Kernel: The Hyperbolic Tangent Kernel is additionally known as the Sigmoid kernel and as the Multilayer Perceptron (MLP) kernel [41]. The Sigmoid kernel comes from the Neural Systems field, where the bipolar sigmoid work is frequently utilized as an enactment function for artificial neurons.Mathematically, it is represented as given in Eq. (6).
3.7.7. Support vector regression Support Vector Regression is like Linear Regression in that the condition of the line is y = wx + b in SVR, this straight line is alluded to as a hyperplane. The data centres around either side of the hyperplane that are closest to the hyperplane are called Support Vectors which are used to plot the breaking point line.
Not at all like other Regression models that endeavour to constrain the bumble between the veritable and foreseen worth, the SVR endeavours to fit the best line inside an edge regard (Distance among hyperplane and limit line), a. Consequently, we can say that SVR model attempts fulfil the following condition: It utilized the focuses with this limit to anticipate the worth.

Staging analysis parameters
RMSE:It is the standard deviation of the residuals. Residuals are a proportion of how a long way from the relapse line information focuses is. RMSE is an extent of how spread out these residuals is. Figuratively speaking, it uncovers to you how centred the data is around the line of best fit [39,40]. Root mean square error is regularly utilized in climatology, anticipating, and relapse investigation to check test results. The formula to calculate RMSE is given in Eq. (7).

RMSE
where, k is the number of observations, a m is the observed value, and b a m is the predicted value.
R2_SCORE: It is a factual proportion of how close the information is to the fitted relapse line. It is otherwise called the coefficient of assurance, or the coefficient of various assurances for numerous relapses. It is always between 0 and 100%. :0% represents that the model clarifies none of the changeability of the reaction information around its mean and 100% shows that the model clarifies all the fluctuation of the reaction information around its mean. The mathematical expression of R2_SCORE is given in Eq. (8).
where, t i is the actual cumulative confirmed cases, b t i is the predicted cumulative confirmed cases, t À i is the average of the actual cumulative confirmed cases.
Mean Absolute Error: It measures the normal size of the errors in a lot of forecasts, without thinking about their heading [38]. It's the normal over the test of the outright contrasts among expectation and genuine perception where every individual distinction has equivalent weight. The obtained value of MAE is calculated according to Eq. (9).
where, p represents the number of errors,|r q -r| denotes the absolute errors. Mean Squared Error: MSE of an estimator is the normal squared contrast between the assessed qualities and the real worth. The MSE is an extent of the idea of an estimator. It is reliably nonnegative, and characteristics increasingly like zero are better.MSE is calculated according to Eq. (10).
where, z is the number of data points, y s represent the observed values, and b y s represent predicted value.

Dataset preparation and Experiment
To make our data perfect with sklearn design, we made another section called ''Days since" which tracks the quantity of days since the initial date. We have taken four evaluation parameters namely Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), r2_score for identification of the best regression model which predicts the best about future cases, recovered cases and deaths cases. The brief description of experiment carried out for the work is given in Table 2.
This investigation endeavours to build up a framework for the future gauging of the quantity of cases influenced by COVID-19 utilizing AI strategies. The dataset utilized for the investigation contains data about the day by day reports of the number of recently contaminated cases, the quantity of recuperations, and the quantity of passing due to COVID-19 around the world. We have utilized five AI models LR, polynomial, Ridge, polynomial ridge and SVM to foresee the quantity of recently affected cases, the quantity of passing, and the quantity of recuperations. We additionally attempt to anticipate the best model for gauging affected cases, recuperated cases and the passing cases.

Results analysis
Experiment 1: The investigation performs forecasts on infected cases and concurring to results polynomial ridge performs better among all the models; SVM additionally performs well and polynomial relapse performs normal score. In correlation, LR and Ridge performs most exceedingly terrible in this circumstance and accomplish nearly the same R2_score. Be that as it may, contrasting the current confirmed cases statistics and our models' forecasts, the polynomial ridge expectation is following the patterns which are near the genuine circumstance.
Results of the extensive experiments have been reported in Table 3 for active cases. Fig. 9 shows the performance of all regression models on the prediction of confirmed cases on the basis of the testing set of data. From the figure we can visualize that the infected cases tend to increase day by day which is an alarming sign for all the infected countries.

Experiment 2:
The investigation performs forecasts on Recovered cases and concurring to results polynomial ridge performs better among all the models; polynomial Regression additionally performs well and SVM performs normal score. In correlation, LR and Ridge performs most exceedingly terrible in this circumstance and accomplish nearly the same R2_score. Table 4 depicts the result of performance of all models and polynomial ridge gives best fit values of MSE, MAE, RMSE and R2_score. Fig. 11 shows the performance of LR, polynomial, Ridge, polynomial ridge and SVM in the form of graph. Be that as it may, contrasting the current recuperation insights and our models' expectations, the polynomial ridge forecast is following the patterns which are near the genuine circumstance. From Fig. 12 we observe that polynomial and polynomial ridge following the pattern which is almost similar to actual recovered cases and SVM gives average performance but the pattern of linear and ridge regression is almost different from actual recovered cases.

Experiment 3:
In death rate future estimationof the polynomial ridge performs better among the various models. Every single other model performs ineffectively, the request for execution from best to most noticeably awful is polynomial ridge is best trailed by LR, Ridge, polynomial and SVM because of the nature of accessible time-arrangement information. From our observation we also observed that no model can predict the best results for death rate prediction. Table 5 depicts the result of performance of all models and linear and ridge gives best fit values of MSE, MAE, RMSE and R2_score. Fig. 13 shows the performance of Different Regression Models on Prediction of Death cases on the testing set. From this we conclude that not a single regression model gives us best result.   Linear and Ridge Regression models' results are very similar to each other and they predict very similar r2_score value in all the three cases. Table 6 represent the comparative analysis of the work carried out for this study with previously published work.   Finally, comparative analysis of the present with previously published work [9][10][11] is performed and the major findings of the study is given as: a. In the study, visualization of Active Cases, closed cases (recovered and death) all over the world has been reported. b. The study also performs the visualization of weekly and daily progress of different types of cases in the world. c. The work also highlights the country wise confirmed cases, Death cases, Mortality rate, Recovery Rate.

Comparative analysis with state of art
d. Comparison of India, China, US, UK, Italy has been done with rest of the world. e. The study has also visualized the growth factor of different cases World-wide. f. Analysis on India and comparison with other counties to find out how much time other countries took to reach a certain number of confirmed cases in comparison to India has been done. g. Prediction of the time taken for doubling the number of confirmed cases has also been done.   h. Future prediction using five regression models and analysis on the premise of Mean square Error, Mean Absolute Error has been done.
i. From the comparison Model it can be inferred that no alternative has done as several things as through this model.

Conclusion
This examination introduced current patterns of COVID-19 occurrence from twenty-second February 2020 to 2nd September 2020 as envisioned in our task. The quickly expanding number of latest COVID-19 cases day by day worldwide has placed an overwhelming weight on clinical assets in nations with enormous flare-ups. Therefore, prediction of future confirmed cases became necessary. The size of information accessible is gigantic associate degreed gathering knowledge and obtaining an intriguing example out of the cumulated information could be a tough trip. With the common info regarding confirmed, recovered and ending across Republic of India for over the time term helps in anticipating and deciding the not thus distant future. Our outcomes imply that there is a sure comprehensiveness in the time development of Covid-19.
This proposes a nation that turns into the venue of a scourge flood can be regarded, in any event in first guess, as a very much mixed synthetic reactor, where various populaces associate as indicated by mass-activity like principles with little association with topographical varieties.
In light of itemized examination of the overall information, we gauge a few key boundaries for COVID-19, similar to the inactive time, the isolate time and the compelling propagation number in a moderately dependable manner, and foresee the inflection point, conceivable completion time and final absolute tainted cases for everywhere throughout the world. We have done prediction by using 5 regression models so that we can conclude which model is best and on the basis of that model we are able to tell the rate of increase in number of infected, recovered and death cases in future. The experiments have been completed with best MSE value as 737383324668728.4, MAE as 24617677.8, RMSE as 27154802.9 and R2_score as À125.5 using polynomial Ridge model for confirmed cases. For recovered cases best MSE value as 1524067870205389.0, MAE value as 38118611.9, RMSE as 39039311.8 and R2_score as À197.1 using linear regression model. Also, for Death cases best MSE value as 1359199418014.9, MAE value as 1106760.1, RMSE value as 1165847.0 and R2_score as À2.5 using SVM model. From our investigation we concluded that polynomial ridge is the best model for predicting confirmed cases, linear regression model for recovered cases and SVM regression model for death cases.

Funding
There is no funding available for this work.