DEVELOPMENT LIFE EXPECTANCY MODEL IN CENTRAL JAVA USING ROBUST SPATIAL REGRESSION WITH M-ESTIMATORS

Spatial regression model is used to determine the relationship between the dependent and independent variables with spatial influence. In case only independent variables are affected, Spatial Cross Regressive (SCR) Model is formed. Spatial Autoregressive (SAR) occurs when the dependent variables are affected, while Spatial Durbin Model (SDM) exists when both variables exhibit effects. The inaccuracy of the spatial regression model can be caused by outlier observations. Removing outliers in the analysis changes the spatial effects composition on data. However, using robust spatial regression is one way of overcoming the outliers in the model. Moreover, the typical parameter coefficients, which are robust against the outliers, are estimated using M-estimator. The research develops the life expectancy model in Central Java Province through Robust-SCR, Robust-SAR, and Robust-SDM to reduce spatial outliers' effect. The model is developed based on educational, health and economic factors. According to the results, M-estimator accommodates the outliers’ existence in the spatial regression model. This is indicated by an increase in R value and a decrease in MSE caused by the change in the estimating coefficient parameters. In this case, Robust-SDM is the best model since it has the biggest R value and the smallest MSE.


INTRODUCTION
According to Indonesia's Central Bureau of Statistics (BPS), Life Expectancy (LE) evaluates the government's performance in improving the population welfare and health. Based on the Central Bureau of Statistics, Indonesian LE increases annually. For instance, the country's LE in 2017 was 71.06 years. Yogyakarta and West Sulawesi provinces have the highest and lowest LE, respectively.
Moreover, Central Java province holds the second position with an increase in LE from 74.02 in 2016 to 74.08 years in 2017 [1]. The enhancement of LE in Central Java Province is linked to educational, health, and economic factors. Regression analysis is a statistical method used to determine the factors influencing an increase in LE.
Regression analysis examines the relationship between dependent and independent variables [2]. Essentially, the methods used to deal with the spatial date are determined by the existence of location effects. In case spatial data are forcefully analyzed using classical linear regression, the assumptions of homogeneity and independence from errors are violated. For this reason, Spatial Regression Analysis is often used in studies [3], [4]. However, different models such as Spatial Cross Regressive (SCR), Spatial Autoregressive (SAR, and Spatial Durbin Model (SDM) are formed depending on the spatial effects [5].
In some cases, the outliers making parameter estimation appear bias. There are two outliers divisions, including Global Outliers whose value is significantly different from others, and Spatial Outlier, which is a spatially referenced object with relatively different non-spatial attributes [6]. Therefore, a robust regression method is used to analyze the data contaminated by outliers [7]. M-estimator, the most ordinary method theoretically and computationally, is one of the estimation methods on robust regression [8], and Moran's scatterplot detects outliers [9]. Therefore, Robust Spatial Regression is used to model Life Expectancy in Central Java. This study has a significant contribution in detecting the factors affecting life expectancy in Central Java Province, based on robust spatial regressions model.

A. Life Expectancy in Central Java Province
The study was conducted in Central Java Province in Indonesia. There are several aspects with significant effects on Life Expectancy (LE), including educational, health, and economic factors.
Educational factors are explained using the "Average Length of School (ALS)" variable. The "Percentage of Households with Clean and Healthy Living Behavior (PCHLB)" and the "Number   Figure 1 shows an overview of the spatial data distribution from each variable, while Table 1 indicates the data's description. Each region has different characteristics and form groups. Therefore, a spatial regression is needed in developing life expectancy models in Central Java Province.

B. Moran's I Test to detect the spatial effect
Spatial autocorrelation is the correlation between variable observations related to the location or an analytic distinction of location and attributes based on point distribution. Therefore, Moran's I methods were used to determine whether there is autocorrelation or spatial dependence between locations [11]- [13].

C. Spatial Regressions
The spatial regression method is used for spatial data types with a location effect (spatial effect).
There are two types of spatial effects; spatial dependency and spatial heterogeneity. In spatial dependency, observations at one location depend on each other. The basis for the spatial regression method development is the classical linear regression method. The development is based on the influence of place or spatial on the analyzed data. Generally, the spatial regression model can be written as follows [14], [15]: where is the vector of the dependent variable, spatial lag coefficient of the dependent variable, is the spatial weights matrix arranged based on contiguity, specifically queen and rook contiguity [16]. is a matrix of independent variables, is a constant coefficient, is the vector of regression parameter, is the spatial lag coefficient of independent variables, is the methods [15].

D. Robust Spatial Regressions
Robust regression reduces the impact of outliers on the parameter estimation in the analysis [17].
This approach is also applied in spatial regression models to analyze contaminated data and provide outliers-resistant results. In this study, the robust M-Estimator method is applied to the SCR, SAR, and SDM models to overcome outliers using Least Square Estimation methods [18], [19]. The Tukey Bisquare weighting function is used as shown below: where c is the tuning constant of the Tukey bisquare weighting (or biweight) estimator, c = 4.685 [19]. The algorithm details used to estimate the parameter of Robust SCR, Robust SAR, and Robust SDM are shown in Table 2-4.      5. Statistical inference of spatial regression model [20], [21].

A. Detection of Spatial Dependence with Moran's I Test
Moran's I test is used to detect the spatial dependence. The test was conducted with the spdep package in software R [22]. The queen method was used to form the spatial weights matrix, as shown in Table 5.

B. Life Expectancy model using Spatial Regression Model
In this research, three spatial regression models will be used; SCR, SAR, and SDM. Based on the spatialreg R Package [23], the results are as shown in Table 6. According to the parameter estimation (Table 6), "Average Length of School (ALS)" and "Percentage of Poor Population (PP)" variables affect the calculation of life expectancy using SCR, SAR, and SDM models. Moreover, the independent variable with spatial dependence, which significantly affects life expectancy modeling through SCR and SDM models, is "lag of Number of Integrated Health Post (W_IHP)." Based on Table 6, using the smallest MSE and the largest R 2 , SDM is the best spatial regression model to describe the life expectancy in Central Java Province.

C. Detection of outliers
The method used to detect outliers in spatial data is looking at the residual models' Moran scatter plot [6]. The Moran Scatterplot of residual SCR, SAR and SDM models are shown in Figure 2.
Several observations indicate outliers' existence (influence measures), including in SCR, SAR, and SDM models. Therefore, a robust spatial regression model is needed to reduce outliers influence and increase accuracy.

D. Robust Spatial Regression to model life expectancy
Robust spatial regression model parameters are estimated based on Algorithms 1, 2, and 3, as detailed in Table 2-4. The results are shown in Table 7. From Table 7, the robust method increases the significance level of the model parameters, which in turn raises the significant variables. This is compared to Table 6, where the significant variables in SCR and SDM models escalate from 3 to 6 and 8, respectively. Moreover, the method results increase model accuracy and decrease MSE.
Therefore, this method increases R 2 in the SCR model from 72.90% to 85.41%, in the SAR model from 65.95% to 75.57%, and in the SDM model from 80.24 to 90.25%. From Tables 6 and 7, using the smallest MSE and the largest R 2 , Robust SDM is the best spatial regression model to describe the life expectancy in Central Java Province.

E. Model Interpretation
From the tests carried out, the best spatial regression method to develop 2017 life expectancy in Central Java Province is the Robust Spatial Durbin Model (Robust SDM). The model formed is as follows: