Ordinary least square and maximum likelihood estimation of VAR(1) model’s parameters and it’s application on covid-19 in China 2020

Vector Autoregressive (VAR) is a multivariate time series model for examining objects with two or more variables in which the variables affect each other under the stationarity assumption. This study aims to compare the parameter estimation procedure of VAR(1) model of Ordinary Least Square (OLS) and Maximum Likelihood Estimation (MLE) methods. The comparison is investigated by both theoretical and empirical approaches. This study uses daily data of the number of positive cases and the number of deaths caused by covid-19 in China. Result shows that theoretically, the parameter estimation of OLS and MLE give the same results. Empirically, it is proven that the parameter estimation done by both methods provide the same results either in the presence and absence of intercept. For the presence of intercept, the number of positive cases influenced by both the number of positive cases itself and the number of deaths from the preceding period. Meanwhile, the number of deaths is only explained by the number of deaths in the previous period. For the absence of intercept, there is a significant effect of the number of positive cases from the previous period toward the number of positive cases, but the effect of the number of deaths in the preceding period is not significant. Hence, there is no effect of interaction between the number of positive cases and the number of deaths cases and vice versa.


Introduction
Times series analysis consists of univariate and multivariate analysis. One of univariate time series analysis is Autoregressive (AR) model. In this model, a dependent variable is influenced by the dependent variable itself from the previous period. In the multivariate case, the AR model expands into Vector Autoregressive (VAR). VAR is a technique to analyze and describe that the observation in t period is influenced by the variable itself and other variables from the preceding period.
Parameter estimation is essential in statistical modelling. The common methods for parameter estimation are Ordinary Least Square (OLS) and Maximum Likelihood Estimation (MLE). OLS minimizes the square error function, while MLE maximizes the log-likelihood function. VAR (1) parameter estimation using the OLS had been performed previously by [1]. While the parameter estimation employing MLE had been discussed formerly by [2]. This paper is aimed to investigate the consistency of parameter estimations obtained from the two methods. Firstly, parameter estimation is ICW-HDDA-X 2020 Journal of Physics: Conference Series 1722 (2021) 012082 IOP Publishing doi: 10.1088/1742-6596/1722/1/012082 2 investigated theoretically by deriving the estimator from the VAR(1) model. Secondly, we perform the estimation empirically by using data of covid-19 in China. Any prediction of both cases is not performed in this study.
In early December 2019, the covid-19 epidemic spread from Wuhan throughout China. It was then exported to a growing number of countries [3]. The first case of covid-19 outside China confirmed by The World Health Organization (WHO) was diagnosed on 13 th January 2020 in Bangkok (Thailand). The number of confirmed cases is constantly increasing worldwide and after Asian and European regions. One of the most important epidemiologic quantities to be determined during an outbreak of a novel or emerging infectious agent is a case fatality ratio (CFR). CFR is usually estimated by using aggregate numbers of cases and deaths at a single time point [4]. The probability of death among cases diagnosed with a disease is often applied as a disease severity measurement. This quantity is usually estimated within a specific period by direct follow-up of cases and ascertainment of their death or recovery [5]. Hence, this study will focus on analyzing the daily number of positive cases and the daily number of deaths caused by covid-19 in China.
Many studies have predicted the trend of covid-19 on Chinese cases, Li [6] built models to predict daily number cumulative confirmed cases, new positive case, and death cases of covid-19 in China based on data from January, 20 th to March 1 st 2020 with machine learning approach. Where the prediction result said that covid-19 outbreak will have peaked on February, 22 th 2020 in China and the world will reach its peak on May, 22 nd 2020. So that, in this study, we are interested in building a model to predict the daily number positive cases and death cases in China based on data from March to April, it was recorded that both the number of positive cases and the death cases in that period had been stationary and has passed the peak period of cases and trend patterns.

Data Source
Data used in this research is the covid-19 data obtained from the Worldometer page (https://www.worldometers.info/coronavirus/country/china/). This research focuses on the period of the outbreak from March, 8 th to April, 16 th 2020. So that, this research covers 40 observations. The variables are defined as:

Ordinary Least Square (OLS) and Maximum Likelihood Estimation (MLE) methods
This study aims to compare the parameter estimation procedure of VAR(1) model utilizing Ordinary Least Square (OLS) and Maximum Likelihood Estimation (MLE) methods. The principle of OLS is minimizing the sum of squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function through a simple formula and does not require any distribution assumption. Meanwhile, MLE maximizes a likelihood function in which the distribution This paper applies both a literature study and quantitative approach. Employing the literature study, it can be understood the estimation procedure comprehensively. The comparison of the estimation result can be proven systematically. The quantitative approach is done by analysing empirical data in order to compare the estimation obtained from OLS and MLE.

Parameter Estimation of VAR(1) Model with Maximum Likelihood Estimation
An assumption in regression model is related to the random error of normal multivariate distribution. For VAR(1) model with two variables, the error component has to satisfy the assumption of ( ) ~ ( , 2 2 ) in the joint probability density of Y with parameter and The first derivative equated with zero, then: (19) After we get the first derivative, the next step is to find the second derivative of the log-likelihood function. In order to ensure that the log-likelihood function is maximum, the second derivative of that function should be negative.
= − 2 ( ) −1 < 0. Due to this result, it can be proven that second derivative of loglikelihood function to parameter is negative. Hence, from the first derivative in equation (19) it can be yielded the estimator which maximize the likelihood function.
Based on the explanation above, it can be concluded that the parameter estimation acquired from both OLS and MLE method provide the same result, that is:

Parameter Signification of VAR(1) Model
In order to investigate whether an estimator has a significant meaning and contribution in the model, a statistical test is implemented to test the hypothesis [7]: Ho:̂ = 0 (The Parameters of VAR Model are not significant) H1:̂ ≠ 0 (The Parameters of VAR Model are significant) Test Statistics: if | | > or the p-value of t test ≤ a, it is deemed statistically significant, otherwise it is not.

The Empirical Approach
The result shows that the number of positive and the number of death cases are negatively correlated due to the correlation coefficient of 0.4104. On the univariate case, the number of death cases qualifies the mean stationarity assumption (P<0.05), but the number of positive cases does not qualify. However, on the multivariate, the eigen value of the two variables < 1 were 0.792 and 0.250 so that it was satisfy the stationary theorem | | < 1 [10]. On the other hand, Figure 1 illustrates that both of the cases are fluctuate around its constant mean. However, both of the cases are assumed to qualify the stationarity assumption both in mean and variance.    (24), the positive cases of covid-19 and the cases of death each only be explained by themselves. There is no enough evidence to say that the positive cases of covid-19 are explained by cases of death, and vice versa.

Conclusion
Based on the previous discussion, theoretically, the parameter estimation of VAR(1) model derived from OLS and MLE estimation procedures gives the same formula. The VAR(1) equation with intercept is the equation where the value of µ is not equal to zero. While the VAR(1) equation without intercept is the equation where the value of µ is equal to zero. Empirically, the parameter estimation obtained from OLS and MLE in simulation to the positive cases of covid-19 and the number of deaths in China, provide the same value of the coefficient. For the presence of intercept, the number of positive cases influenced by both the number of positive cases itself and the number of deaths from the preceding period. However, for the absence of intercept, there is a significant effect of the number of positive cases from the previous period toward the number of positive cases, but the effect of the number of deaths in the preceding period is not significant. Hence, there is no effect of interaction between the number of positive cases and the number of deaths and vice versa. This result can be influenced by the small number of samples. Due to this, for future research, it is important to expand the analysis by involving a sufficient sample size.