Modeling and forecasting the COVID-19 pandemic with heterogeneous autoregression approaches: South Korea

This paper deals with time series analysis for COVID-19 in South Korea. We adopt heterogeneous autoregressive (HAR) time series models and discuss the statistical inference for various COVID-19 data. Seven data sets such as cumulative confirmed (CC) case, cumulative recovered (CR) case and cumulative death (CD) case as well as recovery rate, fatality rate and infection rates for 14 and 21 days are handled for the statistical analysis. In the HAR models, model selections of orders are conducted by evaluating root mean square error (RMSE) and mean absolute error (MAE) as well as R2, AIC, and BIC. As a result of estimation, we provide coefficients estimates, standard errors and 95% confidence intervals in the HAR models. Our results report that fitted values via the HAR models are not only well-matched with the real cumulative cases but also differenced values from the fitted HAR models are well-matched with real daily cases. Additionally, because the CC and the CD cases are strongly correlated, we use a bivariate HAR model for the two data sets. Out-of-sample forecastings are carried out with the COVID-19 data sets to obtain multi-step ahead predicted values and 95% prediction intervals. As for the forecasting performances, four accuracy measures such as RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE) are evaluated. Contributions of this work are three folds: First, it is shown that the HAR models fit well to cumulative numbers of the COVID-19 data along with good criterion results. Second, a variety of analysis are studied for the COVID-19 series: confirmed, recovered, death cases, as well as the related rates. Third, forecast accuracy measures are evaluated as small values of errors, and thus it is concluded that the HAR model provides a good prediction model for the COVID-19.


Introduction
The coronavirus 2019 (COVID-19) emerged in December 2019 has been seriously threatening the life of humans. The World Health Organization (WHO) declared the outbreak of COVID-19 as a pandemic in March 2020. Not only have governments around world tried to control the spread of the virus but also medical institutions are committed to finding vaccines and treatments. Many researchers in all fields are trying to support the health system to prevent the disaster of infection as well. Nevertheless, as of May 1, 2021, the COVID-19 pandemic has infected more than 151 million of the humans all over the world and caused 3 million deaths.
As one of the academia attempts to overcome the crisis, mathematical modeling and forecasting on the COVID-19 cases have been extensively carried out by statisticians and health scientists. The earliest work on the time series model for the COVID-19 analysis has been done in February 2020 by [1], who performed autoregressive integrated moving average (ARIMA) model prediction for the prevalence and incidence of COVID-19. The ARIMA model is well-known as a very E. Hwang and S. Yu infections in European and African countries, including Box-Jenkins ARIMA and fractal interpolations, fractal dimension.
As a more general or more refined probability model, [10] adopted an AR model with two-piece scale-mixed normal distributed errors (TP-SMN-AR) while [11] a TP-SMN-ARMA process in order to possess asymmetry of the distribution of the COVID-19 data. [12] introduced the autoregressive distributed lag (ADL) model to describe the development of the disease at the exponential phase. The ADL model allows describing non-monotonic changes in relative infection over the time, and predicting the outcomes of their decisions on public health. [13] proposed a mathematical model to characterize aspects of the COVID-19 pandemic in South Korea, Italy and Brazil, and compared main features of the three countries as examples of very different scenarios of the COVID-19 pandemic. [14] examined the pandemic for India via sensitivity analysis whereas [15] for China via an optimization method. [16] dealt with an evolution of the COVID-19 infection data with power-law growth and saturation by noticing that the total numbers of infected cases exhibits exponential growth and then power-law growth before the flattening of the curve. [17] proposed a new gray prediction model based on traditional Richards model and modified gray action quantity for the COVID-19 in China, Italy, Britain and Russia. [18] used an efficient prediction model based on Verhulst equation to analyze the COVID-19 in China, Italy and Spain.
Some efforts in the area of artificial intelligence have been conducted to analyze the COVID-19 epidemic. For instance, [19] predicted the number of COVID-19 daily cases in Turkey through a fuzzy rule basing system; [20] exploited the Gaussian process regression as a recently developed machine learning technique to predict the COVID-19 death cases and compared with artificial neural network supervised based methods in terms of their prediction abilities. [21] implemented several forecasting techniques such as naive method, moving average, exponential smoothing, Holt-Winters method, ARIMA, etc. for comparison of prediction of COVID-19 worldwide cases. [22] proposed a new genetic programming based model for confirmed cases and death cases for India. [23] proposed a novel technique based on meta-heuristic GWO (gray wolf optimizer) algorithm to optimize hyper parameters for LSTM (long short term memory) network and compared with the baseline models including ARIMA model. [24] proposed a time series prediction model using machine learning to obtain the curve of disease and forecast the epidemic tendency. Linear regression, multilayer perception, random forest and support vector machines (SVM) machine learning methods were used. [25] examined the advantages of Singular Spectrum Analysis (SSA) for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. The results of V-SSA and R-SSA were compared to those from ARIMA, ARFIMA, Exponential Smoothing, TBATS, and NNAR. [26] studied parameter estimation and prediction of the COVID-19 epidemic using SIR/SQAIR models. [27] also presented a COVID-19 prediction study of artificial neural networks for the daily numbers of cases and deaths per million people, which are different from other works mentioned above. [28][29][30][31][32] investigated forecasting for the COVID-19 as well.
As for the infection fatality rates of the COVID-19, [33][34][35][36][37][38][39] discussed with the empirical results; [33] calculated an estimate of the infection fatality rate by dividing the number of deaths by the total infected cases from seroprevalence data. [34] presented empirical results on infection and fatality rates from cross-country regression and found a significant positive impact of local air pollution on the rates. [39] forecasted COVID-19 growth rates with statistical, epidemiological, machine-learning and deep-learning models, and a new hybrid forecasting method based on nearest neighbors and clustering. Meanwhile, [40,41] analyzed the COVID-19 data curve and [42][43][44] dealt with the finance and economics related to the COVID-19.
Fortunately, vaccines of the COVID-19 have been recently developed and distributed worldwide. Even though the vaccines of the COVID-19 are now available almost all over the world, numbers of the confirmed cases are not decreasing and the virus still threatens the life of the humans and the health system on earth. At the beginning, South Korea was reported to be able to control the disease very well, as [13] mentioned. However, as the winter came with low temperature South Korea has been facing the crisis of the COVID-19 pandemic like other countries, and even as spring comes, the number of the COVID-19 confirmed cases still remains unchanged. The ceaseless spread of the COVID-19 leads us to study the accurate prediction of the time series on the COVID-19 cases data in order to give useful information and help to determine the government policy. Accuracy of the forecasting can provide optimized decisions on the policies such as social distancing stages and preparation of health systems, ready to accommodate the expecting numbers of patients.
In this work we explore Korean COVID-19 pandemic time series such as confirmed cases, released cases, death cases, as well as their related rates, and examine the estimation and forecast by means of a popular time series model. The model we adopt is a heterogeneous autoregressive (HAR) model, which is known to be powerful for longmemory financial volatility. The COVID-19 time series data reveal the long-memory feature as seen in Fig. 1. For this reason we employ the HAR models and investigate statistical analysis on the COVID-19. Some criteria such as 2 , AIC and BIC are computed in the HAR models of order = 2, 3, 4. Another merit of the HAR model on the COVID-19 is that its coefficient of determination 2 is near one for the cumulative series data. 2 is a measure of the global fit of the model and 2 = 1 implies that the relative square error is zero. In conclusion, for these two reasons: long-memory of the COVID-19 data and the 2 -value of the HAR model with the COVID-19 data, by means of the HAR models, instead of other models, we study estimation as well as forecasting on the COVID-19 cases.
We handle seven data sets such as cumulative confirmed (CC) case, cumulative recovered (CR) case, cumulative death (CD) case as well as recovery rate (RR), fatality rate (FR), infection rates (IRs) for 14 and 21 days. The rates are considered as time-varying series and defined as follows: The RR on a day is computed as the ratio of cumulative recovered case to cumulative confirmed case, and the FR is the ratio of cumulative death case to cumulative confirmed case. The IR on a day during the period of days is computed as the proportion of confirmed case on the day to the partial sum of confirmed cases for the last consecutive days, and we set = 14 and 21 because the incubation period of the COVID-19 is between 14 and 21 according to [45]. Our results report that the 2 -values of HAR models fitted with the CC, CR, CD and RR are equal to one, and the value of the FR is 0.999, whereas the 2 -values of IRs are not one, but greater than 0.8, and yet their HAR models have good performance.
Model selections of orders in the HAR models are conducted by evaluating root mean square error (RMSE) and mean absolute error (MAE) as well as 2 , AIC, and BIC. As for the estimation method we use the ordinary least square estimates (OLSE) in the HAR models. Asymptotic normality of the OLSE in the HAR model was established together with a Monte-Carlo simulation study by [46]. Thus, we defer to [46] for the simulation study and omit it in this work. Estimates of the parameters in the HAR models are computed along with standard errors and 95% confidence intervals, based on the theory of [46]. Our results address that fitted values via the HAR models with the OLSEs are not only well-matched to the real cumulative cases data, but also differenced values from the fitted HAR models are well-matched to real daily cases data. Indeed, the proposed model makes small values of RMSE and MAE on the COVID-19 data, and therefore it is an efficient and reliable tool to predict the COVID-19.
Additionally, because the CC and the CD cases are strongly correlated, we consider a bivariate HAR time series model to see the co-movements of a pair of the confirmed and death cases. By comparing the bivariate HAR model with the univariate HAR models, applied to the CC and the CD, we determine better prediction models for the two cases. It is demonstrated that the bivariate HAR model has smaller errors than univariate HAR models.
Out-of-sample forecastings are conducted with the seven COVID-19 data sets to compute multi-step ahead predicted values and 95% prediction intervals. Four forecasting performance measures are evaluated: RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE). Most prediction intervals include the real values except for the FR that becomes suddenly flattened after the time epoch of the out-of-sample. However, predicted values and prediction interval of the FR have reasonable pattern, though its actual values in the outof-sample period deviate from the interval. The CC and the CD cases, applied by two models: the univariate and the bivariate, have different prediction intervals in the two models. It is because the confirmed cases have regressors of the death cases in the bivariate HAR model, and after the time epoch of the out-of-sample, the number of death cases increases very slowly, while in the univariate HAR models the two cases are modeled independently. It is also addressed that in the bivariate prediction the errors of the forecast accuracy for the CC are similar to or smaller, and all the errors for the CD are smaller than those in the univariate prediction. Consequently, as we expected, the bivariate HAR model yields better fitting-performance than the univariate models.
The main findings of this work are three folds: First, HAR models are well-fitted to cumulative numbers of COVID-19 data with a good criterion result like coefficient of determination 2 being near one, which means that the relative square error is near zero. Second, this study analyzes a variety of the COVID-19 time series data such as confirmed cases as well as recovery cases, death cases, and the related rates: recovery rate, fatality rate, infection rates for 14, 21 days, for which all the HAR model can be adopted with good performance. Third, forecast accuracy measures of RMSE, MAE, MAPE and RRSE are evaluated as small errors to conclude that the HAR model provides a good prediction model for the COVID-19 pandemic.
The rest of the paper is organized as follows: In Section ''Heterogeneous autoregressive (HAR) models'' the HAR model and its background are briefly described. In Section ''Estimation'' data description and estimation are explored while in Section ''Forecasting'' forecasting is investigated. Conclusion and discussion are given in Section ''Conclusion and discussion''.

Heterogeneous autoregressive (HAR) models
This section introduces briefly the heterogeneous autoregressive (HAR) model, which we adopt for analysis of Korean COVID-19 series. To do this, some background is stated here.
One of the theories that explain the financial market is a heterogeneous market hypothesis. The theory assumes that investors who accept investment information have different reactions. The heterogeneous market hypothesis in stock markets is the hypothesis proposed by [47]. Dealers and traders observe the situation in stock markets that are traded in a short period of time for trading. Investors holding stocks in the form of assets over the long term are sensitive to government policy changes and are not very sensitive to the frequency of stock trading. Market participants note long-term events along with share price volatility. This is called a heterogeneous market, which is composed by the entire economy on the premise that there is a temporal heterogeneity in making investment decisions. [47] argued that stock market volatility should be analyzed using heterogeneous market hypotheses to identify dynamic changes in the market. Studies of heterogeneous market hypotheses include the HARCH (heterogeneous autoregressive conditional heteroscedastic) model of [47] and the HAR-RV (heterogeneous autoregressive-realized volatility) model of [48]. Inspired by the heterogeneous market hypothesis and the HARCH model, [48] proposed the HAR-RV model, which is a linear autoregressive model with daily, weekly, and monthly moving averages, theoretically equivalent to an autoregressive model of order 22. The HAR-RV model uses not only the heterogeneous market hypothesis of [47] but also the realized volatility of [49]. It represents heterogeneity over daily, weekly and monthly periods. A main feature of the HAR model is long-memory. In particular, it well represents the characteristics of variability, such as long-term memory, and is highly regarded for its good predictive performance on realized volatility.
The HAR-RV model of [48] is described as follows: −5∶ −1 and −22∶ −1 are weekly and monthly average realized volatility, and the parameters , and are coefficients for each day, week, and month, respectively.
In this work, we adopt the HAR model to investigate estimation and forecasting of Korean COVID-19 cumulative cases series and related data. We consider two types of HAR models: univariate HAR and bivariate HAR models. Cumulative cases data have long-memory features as seen in Fig. 1 and thus the HAR models are appropriately fitted to the cumulative cases. Its theoretical analysis as well as simulation study can be seen in recent works, for example, in [46]. We describe a univariate HAR model and a bivariate HAR model in the following: A (univariate) HAR model { , ∈ Z} of order is given by is a sequence of random variables with mean zero and variance 2 .
1 , … , are coefficients of the model to be estimated and it is assumed where ( ) −1 is given in the same way with ℎ , = 1, 2, … ; { 1, } and { 2, } are independent noise processes with mean zeros. Coefficients , The asymptotic property of the least squares estimators in the bivariate HAR model as well as the simulation study were established by [46] along with an application to financial data. In this work we adopt the HAR models in order to analysis Korean COVID-19 pandemic time series data such as the confirmed, recovered, death cases and their related rates.

Data
In this work, some important COVID-19 cases data are considered for analysis of estimation and forecasting. The COVID-19 data used in this paper are available in https://coronaboard.kr/ and http://www. ecdc.europa.eu/en/covid-19-pandemic. For all the computations, we use Python3.8 numpy, scipy and statsmodel.tsa. Each of the data is applied to the univariate HAR time series models, respectively, and furthermore a pair of the data is applied to the bivariate HAR model. Cumulative confirmed (CC), cumulative recovered (CR) and cumulative deaths (CD) cases time series data are basically used. As the related rates, recovery rate (RR) and fatality rate (FR) as well as infection rates (IR) for days, ( = 14, 21) are analyzed. The RR on day is given as the ratio of the CR to the CC on the day and the FR on day as the ratio of the CD to the CC.
The IR on day is defined as the proportion of (daily) confirmed (C) case to the sum of confirmed cases for days, where = 14, 21 are the incubation period chosen according to [45]: The data statistics of the seven data sets , , , , and ( ) , = 14, 21, are described in Table 1, and plots of the data and their autocorrelation functions (ACFs) are depicted in Fig. 1.
In order to compare estimations in the HAR models of orders = 2, 3, 4, we use some criteria such as 2 , AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion): where is the Log-likelihood function and is the number of parameters. Also we compute some performance errors: root mean squares error (RMSE) and mean absolute error (MAE): where is the difference between the observed and estimated values: One of the reasons why we adopt the HAR models is that the Korean COVID-19 data fit very well in the HAR models in the sense of the coefficient of determination 2 being near one, as seen in Tables 2-3. Comparisons in terms of 2 values are commonly used in regression models, for instance, [50] recently computed 2 values of univariate and multivariate HAR models for several financial data sets.
Tables 2-4 report the results of performance errors as well as the criterion rules for the seven data sets: CC, CR, CD, RR, FR and IR ( ) , = 14, 21. In the tables, values in the parentheses of RMSE and MAE are the RMSE and MAE divided by standard deviation, respectively, which are equivalent to RMSE and MAE of the HAR model fitted by the normalized data, that is, the data subtracted by mean and then divided by standard deviation. The reason why we compute the RMSE and MAE of the normalized data is because data considered have all different scales. To compare the different data adopted in the HAR models, we use the standardized data that have mean zero and variance one. In most of the cases of Tables 2-4, order = 4 has the best. In Table 2, which has the criterion results of CC, CR, CD with all 2 equal to one, all errors of RMSE and MAE in the parentheses are less than 0.01. Tables 5-7, which correspond to Tables 2-4, respectively, report coefficient estimates, standard errors as well as the confidence intervals of the parameters in the HAR models for Korean COVID-19 data. Estimation results of the parameters in the HAR(4) model for the CC are, letting = , written by  Table 2 and with estimated coefficients in Table 5 for CC, CR, CD. Moreover, their daily fitted data, which are the differences from the HAR models, are plotted together with real daily data and residuals on the right column of Fig. 2. Also, Fig. 3 shows the HAR fitted model for RR, FR, IR ( ) , = 14, 21 with order chosen in Tables 3-4 and with estimated  coefficients in Tables 6-7, respectively. As shown in Figs. 2-3, where real values, fits and residuals appear in blue, red and gray, respectively, HAR models have good performances with small residuals even for the cases of RR, FR and IR ( ) , = 14, 21.
Among others, from Table 7 we report the estimation results for the IR (14) in the HAR(4) model as follows: for = (14) ,

Estimation in bivariate HAR models
Now we consider a bivariate HAR model for bivariate data of CC and CD. The matrix form of the model (2) is written as The OLSE of  is given by = arg min  ∑ 2 =1 ∑ =1 2 , and it is obtained bŷ Its asymptotic normality theory was established by [46] together with a simulation study in the cases of i.i.d. and correlated errors. We refer to [46] for the simulation, and thus we omit the simulation experiment in this present work. Table 8 presents the results of the criteria for the bivariate HAR( , ) models with some pairs of orders for two data sets and , where we use the normalized data because the two data sets have different scales. It is also intended to compare performance of the univariate HAR models of and with that of the bivariate HAR model.
In the bivariate HAR model, results of coefficient estimates, standard errors and 95% confidence intervals are given in Table 9 for the model of order ( , ) = (4, 4), which has the smallest RMSE among the considered models. The bivariate HAR(4,4) model for = and = in Table 9 is given by  From the results in Table 9, where the 95% confidence intervals have the star mark if they are significant, we can see that the CC affects the CD whereas the CD does not affect the CC so much.
Notice that in the univariate HAR models of order = 4 the RMSEs of and are 0.00648 and 0.00978, respectively, while in the bivariate case, those of and are 0.00647 and 0.00910. We see that the MAEs also have smaller values in the bivariate models. It implies that the bivariate HAR model fits better with the two data sets.

Forecasting
Now in this section we investigate forecasts of the Korean COVID-19 data by means of the HAR models. For the forecasting, we use the data during the in-sample-period from Jan.20.2020 to Jan.20.2021, and compute the out-of-sample forecasts between Jan.21.2021 and Feb.18.2021.    We examine forecast performance by evaluating four measures: root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root relative square error (RRSE) for -step ahead forecast, = 1, 2, … . Suppose that { 1 , 2 , … , } are observed data during the in-sample period and { + ∶ = 1, 2, …} are data used to compute out-of-sample forecasts. Let̂+ | be the -step ahead forecast. The four forecast performance measures of the -step ahead forecasts are given as follows: indicates a general time epoch of outof-sample to obtain the -step ahead forecast so that − + , … , − 1 + are time indices used to compute the difference between the real value and the forecast, as well as −1+ is the last observation in the total sample for each , where = is the size for the forecasts, which depends on .

Mean Absolute Percentage Error (MAPE):
= 100 Root Relative Square Error (RRSE): We evaluate the performance results of the first, 7th, 14th, 21th step ahead forecasts (i.e., = 1, 7, 14, 21) for the Korean COVID-19 time series data in Tables 10-13. In Table 10, forecasting performances of CC, CR, CD adopted in univariate HAR models are described whereas bivariate HAR forecasting performances of (CC, CD) are given in Table 13. Tables 11-12 report the multi-step ahead forecasts of RR, FR and IR ( ) . In Tables 10-13, values in parentheses of RMSE and MAE indicate the errors of forecasts by means of the normalized data with mean zero and variance one, as in Tables 2-4.
Moreover, the forecasts plots can be seen in Figs Fig. 4 is independent of the CD, but the bivariate model of CC depends on the CD. As seen in the plots of real data of the CD in Figs. 4 and 6, the numbers of death cases are increasing very slowly during the period from Jan.21.2021 to Feb.18.2021, and this fact affects the slow increasing predicted values of the CC in the bivariate model in Fig. 6. Also, we note in Tables 10 and 13 that in the bivariate prediction model the errors for the CC are similar to or smaller, and all the errors for the CD are smaller than those in the univariate prediction model. For example, in the univariate CD has RMSEs (7.66, 37.92, 94.59, 172.85) whereas in the bivariate (7.087, 27.48, 62,07, 104.08) for = 1, 7, 14, 21. Consequently, we conclude that the bivariate HAR model yields better fitting performance than the univariate model as we expected.

Conclusion and discussion
This present work concerns with the COVID-19 time series analysis in South Korea. The COVID-19 pandemic lasts longer than one year, and humans are still threatened and are not free from the virus. We deal with a variety of Korean COVID-19 time series data sets such as confirmed case, recovery case, death case as well as recovery rate, fatality rate and infection rates. An interesting and robust time series model for long-memory structured data is a HAR model that has recently attracted much attention from econometrician and statistician in the field of time series analysis.
In this work, instead of the traditional ARIMA model, the HAR models are applied to study the estimation and forecasting for the Korean COVID-19 time series data. The reason for choosing the HAR models in the present paper is that the Korean COVID-19 data have long-memory features, and moreover some criteria such as 2 , AIC and BIC have good performances. Indeed, the 2 -values of the four data sets: cumulative confirmed, cumulative recovery, cumulative death cases and recovery rate, are one, and also the value of fatality rate is 0.999. Those of infection rates for 14 and 21 days are larger than 0.84.
For the model selection, we compute RMSE, MAE as well as 2 , AIC, BIC to decide the order of the HAR models, and evaluate coefficients estimates by means of the least square method for each model. Moreover, 95% confidence intervals along with standard errors are constructed. Adopting the HAR models, we see that the actual values of COVID-19 cases and the fits by the HAR models have very small errors in residuals, with well-matched fitting plots as seen in Figs. 2 and 3. Also, real daily data of confirmed, recovered and death cases are well-matched with the differences from our proposed HAR models. Furthermore, noticing that confirmed and death cases are strongly correlated with each other, a bivariate HAR model is applied to the two data sets. Better fitting is demonstrated with smaller RMSE and MAE in the bivariate HAR model than in the univariate HAR models.
To investigate the out-of-sample forecasting, HAR models with the orders, chosen optimally according to the criteria rules, are used to compute multi-step ahead forecasts. As forecasting performance measures, RMSE, MAE, MAPE and RRSE are evaluated at the 1, 7, 14 and 21-step ahead forecasts. Predicted values as well as 95% prediction intervals are illustrated from the out-of-sample forecasting. Most of the cases show that forecasts are close to real values and the 95% prediction intervals contain the real values except for the fatality rate. In the fatality rate case, predicted values and prediction interval have reasonably increasing pattern, although the actual values of the real data deviate from the prediction interval. The reason is that the fatality rates become suddenly flattened after the time epoch of the out-of-sample.
For the two data sets of cumulative confirmed and cumulative death cases, as mentioned above, the univariate and bivariate models are considered for forecasting. We see that somewhat different prediction intervals for the cumulative confirmed cases are found. It is because the univariate model for the confirmed cases is independent of death cases, whereas the bivariate model depends on the death cases as regressors. The numbers of death cases are increasing very slowly during the out-of-sample period, and for this reason, slowly increasing predicted values of the confirmed cases seem to appear in the bivariate model. Also, the forecast results report that in the bivariate prediction model the forecast accuracy errors for the cumulative confirmed case are similar to or smaller, and all the errors for the cumulative death case are smaller than those in the univariate prediction model. In conclusion, the bivariate HAR model improves the fitting-performance of the univariate models.
The novelty of this paper is as follows: This work is the first attempt of the HAR models for the COVID-19 analysis in literature, and moreover we validate that the HAR model provides a good prediction model for the COVID-19 time series data. Various data sets from the Korean COVID-19, that is, time series data such as confirmed, recovered, death cases as well as recovery, fatality, and infection rates, are applied to the HAR models. Parameter estimation and forecasting are illustrated with good performances of fitting and accuracy. In the pair of the confirmed and death cases, the bivariate HAR model has better fittingperformance than the univariate HAR models. Therefore, the proposed model can be an efficient tool for the COVID-19 analysis, which will help governments and health systems to manage social policies of the COVID-19 disease prevention during the pandemic.
We have explored the estimation and prediction using the HAR models for the COVID-19 time series data in South Korea. However, Korean COVID-19 cumulative confirmed or death cases are not much similar to those of worldwide or US COVID-19 data, which have some pattern of oscillation with 7 days periodicity. By imposing the periodicity to the HAR models, we can extend the discussion of this work to the COVID-19 in other countries. For example, recently [51] proposed an integer-valued AR model with oscillating weighted cosine geometric innovations for modeling the COVID-19 series in some small island developing states. The weighted cosine geometric process accounts for oscillating patterns and, according to [52], it outperforms wellknown competing discrete models. The HAR model with the weighted cosine geometric innovation terms will be an interesting model to fit the COVID-19 series worldwide or in other countries with oscillation feature. This extension will be studied in a future research. Another interesting time series model we suggest is a partially periodic HAR model, which has some truncation in the oscillation. In general, as seen E. Hwang and S. Yu   Fig. 4. Forecasting and 95% prediction intervals of Cumulative Confirmed, Cumulative Recovery, Cumulative Death cases using the univariate HAR models.   the numbers of the confirmed cases in US., Germany, Brazil, India, etc., the oscillation seems to appear as the magnitude of the COVID-19 series is large, and furthermore, the larger magnitude the stronger oscillation. In order to represent this phenomenon, partial oscillating which has some truncation as well as is proportional to the magnitude of oscillation might be desirable rather than pure oscillating. This topic on the partially periodic HAR model will be also very interesting and promising.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.