MODELING CASE FATALITY RATE OF COVID-19 IN INDONESIA USING TIME SERIES SEMIPARAMETRIC REGRESSION BASED ON LOCAL POLYNOMIAL ESTIMATOR

: At the beginning of 2020, the world was shocked by the COVID-19 which causes acute and contagious respiratory problems. Several countries in the world have declared the COVID-19 outbreak a pandemic, including Indonesia. The number of patients who have infected by the COVID-19 and died due to this virus continues to increase every day. In this study, we model the Case Fatality Rate (CFR) of COVID-19 in one city in Indonesia by using semiparametric regression model approach based on time series local polynomial estimator where the local polynomial estimator is used to accommodate fluctuations of daily COVID-19 death rate data. The daily data on CFR of COVID-2


INTRODUCTION
In December 2019, Coronavirus disease 2019 (COVID- 19) was first discovered and reported in Wuhan, China, and then it was later reported in other countries.The COVID-19 is a novel viral disease caused by severe acute respiratory syndrome Coronavirus-2 (SARS-Cov-2) which led to a global pandemic [1].The COVID-19 spread rapidly to other countries in the following weeks and was eventually classified as a global pandemic on March 11, 2020 [2].It has rapidly spread around the world, posing enormous health, economic, environmental and social challenges to the entire human population [3].
In Indonesia, COVID-19 case was first announced to public on March 2, 2020.Based on data at that time, it is known that the number of confirmed cases of COVID-19 was 1528 cases with the number of deaths was 136 [4].The Case Fatality Rate (CFR) has especially become an important thing that was discussed by many people when the pandemic occurred.Sipahutar and Eryando [5] explained that on social media, people have been debating the national CFR of COVID-19 and even compared it with CFR of other countries such as China and Vietnam.This MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION comparison was followed by comparing the performance of Indonesian government in handling the COVID-19 pandemic.Therefore, a proper analysis is needed to model the CFR of COVID-19 by using statistical modeling techniques.One of the statistical techniques used to describe functional relationship between response variables and predictor variables is regression model.
There are three common types of regression models namely parametric regression model, i.e. parametric, nonparametric regression model, and semiparametric regression model.The parametric regression model has rigid assumptions such as has known certain form of regression function and other rigid assumptions.While the nonparametric regression model assumes that the regression function is unknown and contained in the Sobolev space [6,7].The nonparametric regression model approach has high flexibility because the regression function is not specified in a particular form but it is assumed to be smooth, so that it can be estimated by using certain smoothing methods based on data patterns [8].Furthermore, the semiparametric regression model is a combination regression model between parametric regression model and nonparametric regression model [9][10][11].One of the parametric approaches used for predicting purpose is the linear regression analysis which has been done by Suleiman et al. [12].Additionally, another parametric approach is negative binomial model which has been done by Pan et al. [13] for identifying factors that may explain the variation in CFR across countries.Some previous studies have discussed nonparametric regression models for some cases and applications.For example, Nidhomuddin et al. [14] used local linear estimator for modeling case increase and case fatality rate COVID-19 in Indonesia, Siregar et al. [15] used penalized spline estimator to model world crude oil price on the effects of COVID-19 pandemic.Next, in the nonparametric regression and semiparametric regression models, several smoothing techniques which are often used are kernel [16,17], local linear [18][19][20][21], local polynomial [22][23][24][25][26][27], and Spline [28][29][30][31][32][33][34][35][36][37].According to Chamidah et al. [23], the local polynomial estimator is a popular approach and has exceptional cases that are if the degree of local polynomial equals to zero then it is called as kernel, and if the polynomial degree equals to one, then it is called as local linear.Furthermore, we use the local linear estimator when the data patterns are generally monotonous [38].According to Loader [39], we can approach locally every differentiable function by a straight line.Time series data is a set of observational data taken at different times and collected periodically at a certain time interval [40].Time series data includes one research object or individual, such as the confirmed cases, fatality rate, mortality rate in several periods, such as daily, weekly, monthly, or yearly.Time series data modeling usually uses classical models such as Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA).These models are linear models in the time series, which are very commonly used in public health and biomedical data [41].There are previous researchers that have modeled case fatality rate of COVID-19 for example Meimela et al. [42] and Somyanonthanakul [43], used ARIMA, ARIMAX and association rule mining to forecasting COVID-19 cases.
The nonparametric regression approach for time series data has been developed by several researchers, for example, Gao and Gijbel [44], Chen et al. [45], Gao and King [46] carried out modeling based on the kernel estimator.Also, some researchers such as Fernandez and Cao [47], Wang and Phillips [48], and Li et al. [49] developed nonparametric regression approach to time series data based on local linear estimators.Dong and Gao [50] have used the truncated least square estimator to model time series data.In addition, several researchers have developed semiparametric regression approach to time series data, including Gao and Hawthorne [51]; Perez and Vieu [52], and Gao [53] modeled time series data by using kernel estimator.
Local polynomial estimator is a popular new approach to smoothing technique which has a special form for polynomial degree (d) equal to zero which is called kernel estimator or constant local estimator.Liang and Chen [38] it is explained that if the data has a monotonous pattern, a degree polynomial of one (d=1) can be used.But if the patterned data is not monotonous with a local maximum or minimum, then the degree of polynomial one cannot overcome the sharpness of the curve in the data so that a local polynomial estimator with a higher degree of polynomial is needed to get a suitable estimate.MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION This research aims to develop a model of local polynomial semiparametric regression for time series data and it's applied to modeling a case fatality rate of COVID-19 in Kota Pasuruan, East Java, Indonesia.This research theoretically discusses how we estimate the model using polynomial local estimator for semiparametric regression approaches.The result of this research is expected to be applied in modeling a CFR of COVID-19 in Indonesia.

PRELIMINARIES
Given a nonparametric regression model as follows [54]: where y is response variable, ( ) (2)  where ( ) ( ) If Equation ( 2) is written in matrix notation, we obtain Based on equation (3), we can write equation (1) as: ( ) By giving T samples paired data   1 , T tt t vy = , equation ( 4) can be written as follows: (5) Hence, equation ( 5) can be expressed in matrix notation: ( ) , we apply weighted least square (WLS) optimization as follows: ) is kernel function with bandwidth h defined by [55] as follows: ( ) In this study, we use Gaussian Kernel defined as follows [56]: Next, we take derivative of equation ( 7) with respect to ( ) Based on equations ( 3) and ( 8), the local polynomial estimator for ( ) To get the best estimation, one of the most important things is to choose an optimal bandwidth with associated Kernel function.This can be done using Generalized Cross-Validation criterion with formula [56]: The error rate measurement to compare the best estimator is based the value of: (11)  − is component of parametric, ( )  is the measurement error with mean is 0 and variance is  13) is smooth function assumed to have an unknown form and is estimated using a nonparametric approach based on local polynomial estimator.The nonparametric function ( ) gv − is a smooth function with continous and differentiable properties.The differentiable function can be approximated by Taylor series expansion.The Taylor series for ( ) v can be expressed as follows: ( ) ( )( ) where ( ) ( ) ( ) The equation ( 13) MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION can be written in matrix form as follows: (16) ( ) ( ) ( ) Based on equation ( 16), the model on equation ( 13) can be written as follows: vy − = so that based on equation ( 17) can be written as follows: (18) 18) can be expressed in matrix form as: The estimator of ( ) is obtained based on local polynomial estimator using the Kernel function ( ) as weighting.The weight shape is determined by the kernel function, while the weight size is determined by the value of the parameter h called bandwidth.Estimation of ( ) 0 ˆv β in equation ( 19) using the weighted least square (WLS) can be obtained by minimizing the function is the matrix containing the weight function, and ( ) The estimated value for ( ) This estimated value can be obtained by differentiating equation ( 20) with respect to ( ) . Hence, we have: Based on equation ( 16) and ( 21), local polynomial estimator for ( ) gV − can be written as follows: Equation ( 22) can be written as follows: Based on the estimation results on equation (23) .Semiparametric regression model on equation (24) still contains a parameter  , so we need to estimate it.The equation ( 24) can be written as follows: ( ) ( ) ( ) To get the estimator of  can be done by minimizing a sum of square S of semiparametric regression model given in equation ( 25) as follows: Hence, the equation ( 26) can be written as follows: The estimated value of  is  .This estimated value can be obtained by differentiating equation (27) with respect to  .The minimum value of S is reached when The estimated value of  can be described as follows: The estimation results of the semiparametric regression model based on the local polynomial estimator are obtained by substituting equation ( 28) into equation (24).

Implementation on Case Fatality Rate of COVID-19 Data
In 2020, East Java was in the second highest for confirmed cases of COVID-19 in Indonesia.
One area that has contributed is Kota Pasuruan.Below, we will model a CFR COVID-  The functional link between CFR and CGR at one period previous is depicted in Figure 1.We can observe that it deviates from the traditional definition of a functional relationship.We can see that the pattern of a relationship between CFR and CGR in the previous day is unknown, so that it can be modeling using nonparametric approach.Figure 2 explains that the functional relationship between the CFR and the previous period's CFR is linear, so it can be modeling using a parametric approach.If we combine these in one case, we can use semiparametric approach.From the information given above, the semiparametric model is written as follows: (31) ( ) ( ) Case fatality rate is being predicted by researchers using semiparametric regression approach based on a local polynomial estimator.A semiparametric regression model based on a local polynomial estimator with polynomial ( )  According to Table 6, a semiparametric regression model based on local polynomial estimator has a MAPE value less than 10%, so the prediction using a local linear model is highly accurate.
Figure 4 show that the plot between actual data of daily CFR COVID-19 and its predicted value.
We can see that the predicted value using a semiparametric regression model which is shown the red line (using local linear model) appears to coincide with the actual data.It shows that the prediction error has obtained by a semiparametric regression approach too small.
Based on the result of analysis, we know that prediction of CFR COVID-19 using semiparametric regression model has a small MAPE value, less than 10%.It means that predicted value from this model is highly accurate.It can be used as reference for government to modeling or predicting a CFR of a disease or outbreak.So that, the government can determine that policies will be made to reduce a CFR of disease or extraordinary event.
Estimating a semiparametric regression model is equivalent to estimating a regression  Furthermore, the obtained model estimate can be used to predict CFR in Indonesia for supporting one of the SDGs, control the pandemic.

.
estimate the nonparametric regression model where the component of parametric assumed known such that the following equation is obtained: Next, we use a local polynomial estimator to estimate the nonparametric function in estimating semiparametric regression model.Equation (

Figure 1 .Figure 2 .
Figure 1.Scatter Plot of CFR vs CGR in the previous day Kota Pasuruan

Figure 3 .Figure 4 .
Figure 3. Plot of bandwidth versus GCV for the Gaussian function type (d=1) function that describes the functional relationship between the response variable and the predictor variables.The regression function of the semiparametric regression model consists of parametric components and nonparametric components.Theoretically, in estimating a semiparametric MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION regression model for time series data using a local polynomial estimator, we first estimate the parametric component parameter by taking the solution to the weighted least square (WLS) optimization function.In other word we determine the values parameters in the parametric component which minimize the WLS function, so that we obtained the estimator for these parameters as presented in equation(19).Next, based on equation(19), we estimate the nonparametric component regression function of the model such that we obtained the estimated local polynomial semiparametric regression model for time series data as presented in(22).
estimated semiparametric regression model based on local polynomial estimator can be used to modeling a CFR of COVID-19.In the future, the estimated model can be used to predict a CFR of COVID-19 which is influenced by the parametric component such as CFR in the previous period and the nonparametric component such as CGR in the previous period.

Table 1 .
MAPE Value Criteria [57]e T is the size of the sample,  ̂ is the value predicted by the model for time point t and y is the value observed at time point t.The criteria for MAPE values are shown in the Table1[57].In this section, we provide the theoretical results on estimating local polynomial semiparametric regression model for time series data including estimating model for estimating local polynomial semiparametric regression model and implementation to modeling a case fatality rate (CFR) of COVID-19.
The following are the results of descriptive analysis of variables: MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION

Table 2 .
[58]able Descriptive StatisticsSince the first two cases of COVID-19 were announced on March 2, 2020 in Indonesia.This announcement seemed to be a sign of major changes and adjustments.measurestopreventvirustransmissionand social restrictions were put in place, something that had never existed before in the country.COVID-19 continues to spread in the country until it has infected all provinces in Indonesia (34 provinces) and 510 districts/cities as of the end of 2020.Difference in the conditions of the COVID-19 pandemic in each province/district/city have resulted in variations in policy for handling it.At the end of 2020, several regions in Indonesia began to loosen a PSBB (Pembatasan Sosial Skala Besar) or begin a transition period[58].Based on Table2, it is known that October 2020 to June 2021 is a transition period for living together with the COVID-19 virus, the average of CFR has been 10.440%, with a comparatively large diversity of 1.813%.The spread of COVID-19 in Kota Pasuruan during the transition period can be seen from CGR value with the mean as 0.549.In August 4, 2021, Kota Pasuruan had the lowest CFR as 6.407% but the highest CFR occurred on October 2, 2020.

CFR-1 CFR Scatterplot of CFR vs CFR-1
MODELING CFR OF COVID-19 USING TIME SERIES SEMIPARAMETRIC REGRESSION it has a local linear model.The optimal bandwidth for a researcher's local linear model is 0.20 with minimum GCV as 0.0025.The researcher then uses the MAPE criterion to determine the prediction accuracy.
1 p = as the degree is created in this study, we can say that

Table 3
Semiparametric regression approach using MAPE

Table 3
[57]s that MAPE value for training is 0.3407% and testing is 0.2385%.According to Moreno et al.[57], if the MAPE of a semiparametric regression technique is less than 10, the model provides a highly accuracy for modeling.