EXAMINING REGIONAL FACTORS ON MALNUTRITION RATE IN INDONESIA USING SPATIAL AUTOREGRESSIVE APPROACH

: The most frequent issues in malnutrition rate modeling analysis are skewed distribution and spatial autocorrelation. Previous researches were generally focused on spatial autocorrelation between neighboring regions or auto relationships between malnutrition rates and significant factors across different quantiles of the malnutrition rate distribution, but rarely both. This study aims to estimate how contributing factors influence the malnutrition rate. The estimation is carried out by implementing the spatial autoregressive (SAR) approaches, including ordinary SAR, Robust SAR and SAR Quantile (SARQ), using 2021 data from the Health Ministry of Indonesia. The result shows that the SARQ outperforms the SAR and the Robust SAR in data fitness and prediction accuracy. The SARQ is also insensitive to outliers and skewed distribution. Estimation using SARQ provides effects of explanatory variables vary with the quantiles, while SAR and RSAR cannot do


INTRODUCTION
Nutritional issues are a public health problem that can affect all age groups. The most affected age group is the 0-5 years old (infants). Underweight children under five have a higher risk factor for stunting than adults. Growth obstacles in early age will adversely affect the quality of future generations [1], [2]. Indonesia is a prime example of the burden of malnutrition. About 1 in 3 children under five years is stunted [3]. In September 2021, UNICEF Indonesia screened 84 million children under the age of five and identified 500,000 new cases of children suffering from wasting, which carries an increased risk of death. Based on the data from the Indonesian Toddler Nutrition Status Survey in 2021, the prevalence of toddlers stunting is 24.4 percent (5.33 million). The prevalence of stunting has decreased from previous years, that is 26.92 percent in 2020. The Covid-19 pandemic, which has entered its third year, has led to an increase malnourisment leading to stunting for children under five. By 2024, Indonesia is targeting to lower the stunting rate by 14 percent. To achieve this target, a situational analysis is required to determine the malnutrition rate among number children under the age of five and to assess key determinants of the malnutrition in specific social and geographical locations [4]. This analysis will provide grounds for intervention so that the actions can be tailored to address the contextual needs in Indonesia. Studies have previously been made to appropriately analyze childhood stunting and wasting in developing countries, including Indonesia. Unfortunately, most of the analyses emphasized on modeling mean regression instead of quantile regression and did not consider the spatial effect.
It is evident that highly left-skewed distribution and outliers (e.g., minimum figure) malnutrition data due to stunting and wasting, commonly present significant spatiotemporal differences [5], [6].
There is the possibility of spatial dependence in the data, meaning that observations in one can influence observation in other areas. As Tobler states that everything is related to one another, but something adjacent is more influential [7]. Linear regression analysis ignores the existence of this spatial dependence, this method is also highly sensitive to outliers and fail to be extended to the full distribution of malnutrition, which may make these models output biased results and side 3 EXAMINING REGIONAL FACTORS ON MALNUTRITION RATE IN INDONESIA effects [8]. However, it is more attractive for researchers to understand the effects of risky factors in spatial with high malnutrition cases than in other locations. Since there is serious problem relate to stunting in these spatial, it is inevitable to implement the spatial auto regressive (SAR) method [9]. Hybrid of SAR and quantile regression (QR) is also applied here, it is assumed this method will produce better model in such of this case [10]- [14].
There has recently been an increase in the research on estimating and testing spatial auto regressive models. For instance, Dai et al. [15] explored the quantile regression approach for partially linear spatial auto regressive models with possibly varying coefficients using B-spline. Dai et al. [16] investigated fixed effects quantile regression for general spatial panel data models with both individual fixed effects and time period effects based on the instrumental variable method. Zhang et al. [17] studied a penalized quantile regression for a spatial panel model with fixed effects. Dai and Jin [9] employed the minimum distance quantile regression (MDQR) methodology for estimating the SAR panel data model with individual fixed effects.
This study applies the hybrid of spatial auto regressive and quantile approach (SARQ) to achieve the best model of malnutrition data. The proposed method will be compared with spatial auto regressive (SAR) and Robust SAR. As far as we know, the SARQ model has not been applied in any existing references on this subject. The model is important to investigate the factors impacting malnutrition incidence in Indonesia and advise decision-makers on reducing malnutrition.
The rest of the paper is organized as follows, Section 2 is about the methods and models including the variables involved in the model, SAR, Robust SAR, and SARQ methods. In Section 3, we implement the methods to construct an acceptable model regarding malnutrition factors in Indonesia.

Data
This study assumed that the percentage of low birth weight, percentage of vitamin A, percentage of health insurance, percentage to access water hygiene, and percentage of breastfeeding as factors 4 F. YANUAR, T. ABRARI, A. ZETRA, I. RAHMI HG, D. DEVIANTO, S. AHDA influencing child malnutrition, based on the study by many researchers [18]- [21]. The hypothesis model then fitted to the data set regarding malnutrition obtained from the study's results on the nutritional status (stunted) of children under five in thirty-four provinces in Indonesia in 2021 [22].
Three estimation methods are applied in this study to achieve the best model of malnutrition rate in all provinces in Indonesia. The detailed descriptive statistics about these candidate variables are recorded in Table 1.

Methods and Preliminary Analysis
A preliminary diagnostic test, i.e., multicollinearity, was applied to the hypotheses model. The variance inflation factor (VIF) was calculated to identify the statistical correlation using the following formula: with 2 is coefficient of determination of corresponding model. The variables with VIF values greater than five would be removed from the analysis [8], [23]. It indicates that the corresponding variable has a significant correlation with others, or it is said that multicollinearity problem is due in this case.  The Moran's I test was applied subsequently to identify whether there is a significant spatial correlation of variables among neighbors. Moran's I is defined as follows: with is the number of areas (spatial units), the malnutrition rate of the i-th region, is the element of the spatial weight matrix, which represents the adjacent relationship between the i-th region and the j-th region. It is defined as follows: In this study, the queen contiguity method is applied to design the spatial weight matrix, . That is, if there is a common boundary between the two regions, they are considered adjacent. A spatial weight matrix can then be constructed using this method. In this study, the spatial unit is provinces in Indonesia. The adjacent provinces in Indonesia are presented in Table 3. Four provinces, i.e., Bangka Belitung Island, Bali, West Nusa Tenggara, East Nusa Tenggara, are not included in the analysis since all four provinces are the island and have no adjacent provinces. Figure 1 provides the map of malnutrition' percentage in each province in Indonesia. Each region has different characteristics and form some groups. Therefore, a spatial regression is needed in modeling the malnutrition case in Indonesia.  The Moran's I value commonly ranges from -1 to +1. The positive and negative values of the Moran index indicate that the variable is spatial clustering or spatial dispersion [24]. If the value is close to zero or the p-value is less than 0.1, the variable does not have an easy-observed spatial pattern [8]. Thus, spatial dispersion would not be considered in this study. The Moran's I value obtained in this study is 0.3622 (p-value is 0.014), which is indicated that there is a spatial correlation of variables among neighbors.
Lagrange Multiplier ( ) test is also applied to test the existence of spatial correlation among response or indicators using the following formula: where u is an error. The Lagrange Multiplier's value for this study is 3.388 (p-value is 0.056). This result indicates the spatial dependence of the response variable is due to the hypothesis model. It means that the malnutrition rate in this study has a spatial correlation among neighboring provinces in Indonesia. Additionally, the heteroscedasticity test is also determined in the hypothesis model using the Breusch-Pagan test: The Breusch Pagan's value for this malnutrition data based on the hypothesis model is 3.499 (pvalue is 0.061). It indicates that heteroscedasticity is due among provinces in Indonesia regarding the malnutrition rate.
Based on several tests above, it was concluded here that spatial correlation is due among regions in this malnutrition data. Therefore, the classical regression method using ordinary least squares cannot be applied in this study. Several researchers suggested using the spatial autoregressive (SAR) [25] method to overcome these problems in order to obtain an acceptable model. Several researchers also applied the Robust spatial autoregressive (Robust SAR) method to construct a model with spatial autocorrelation of data and outlier problems [26], [27]. The spatial autoregressive with quantile (SARQ) method also could be implemented to solve the problem of spatial autoregressive case [8], [9], [16].

Spatial Autoregressive Model (SAR)
Spatial Autoregressive (SAR) model is one of the spatial models based on area. The spatial regression model can describe the relationship between independent and dependent variables by considering the effect of the data location. The presence of location effects on the data is represented by weights. The general model of the spatial autoregressive framework is defined as follows: where is a vector of dependence variable, is a parameter of spatial lag coefficient of and indicates the degree of spatial autocorrelation. is an × spatial weight matrix, specifying the spatial correlations between of different areas, with n being the number of observations. denotes a matrix of independent variables, presents a vector of regression coefficient of each explanatory variable, and is a variance of as a vector of spatial error.
The parameter which is generated to obtain a spatial regression model is estimated using the following formula: The parameter, is estimated by while the parameter is estimated using a numerical approach. 9

Robust Spatial Autoregressive (Robust SAR)
The Robust spatial autoregressive approach uses the same model as the previous general model of the spatial autoregressive framework, which is defined as follows: Robust SAR applies the Robust estimator M to estimate parameter and spatial coefficient ( ).
The Robust estimator S is used to minimize the following objective function: Iteration will stop if the value of |̂+ 1 −̂| close to zero.

Spatial Autoregressive Quantile Model (SARQ)
The Quantile Regression model is developed to address the skewed distributions because of outliers. It enables us to examine the influence of explanatory variables at any quantiles of the malnutrition distribution. The effects of factors on malnutrition were not only changed with the quantiles but were also affected by potential spatial autocorrelation. The spatial autoregressive quantile regression (SARQ) model, can be used to account for spatial correlation among neighbors at both central and non-central locations [28]. The regression model of SARQ is described as: where and represent the spatial lag term and coefficients of the explanatory variable at the th quantile. is the spatial weight matrix. The parameter estimation of the SARQ model is estimated by the instrumental variable method proposed by Chernozhukov and Hansen [7], [29].  we take ̂ to be an identity matrix, as written in Chernozhukov and Hansen [7], [29].
3. Get the estimator of :

Model Comparison
Three common measures are used in this study [8]  and RMSE are, the higher the accuracy of the corresponding models. In opposite, the higher the value of R-squared, the better the corresponding model is.

MAIN RESULTS
Based on a preliminary diagnostic test, it is informed that there is a positive spatial correlation between malnutrition rates in various provinces in Indonesia. In this section, we apply the above three models to analyze the spatial autocorrelation and influence factors of malnutrition in Indonesia in 2021. We consider the data from all 34 provinces of Indonesia in the analysis.
The spatial autoregressive (SAR) approach is used first. Table 4 presents the estimated parameter model based on the SAR model.  Table 4, it can be seen that the coefficient of spatial correlation ( ) of the malnutrition in this regression model is 0.324. This value is significant since the p-value is 0.031, less than 0.05. We then check for the existence of residual outliers using Moran's scatterplot as presented in Figure 2.
The plot shows that there are three outliers in SAR's residual with observation numbers 2, 7, and 19. This results informed that the distribution of error is not homogeneous or the assumption of homogeneity of variance is violated. Based on several tests above, it is identified that the performance of the SAR model is not good and it could not be accepted. This study then applies the Robust SAR model to construct an acceptable model of malnutrition. The iteration processes of parameter estimation are presented in Table 5. There are seven iterations carried out, since |̂7 −̂6| close to zero for = 1, 2, 3.
The parameter estimate obtained from 7 th iteration of the Robust SAR model is provided in Table   6.   Figure 3. By looking at Figure Table 7.
From Table 7, it can be seen that spatial autoregressive coefficient is significantly positive at lower quantiles ( = 0.10, 0.15) and at upper quantiles ( ≥ 0.70) except at = 0.95, with a significant level 5%. It shows that in lower cases of malnutrition, the neighboring provinces of a region positively influence it. Provinces with higher cases of malnutrition tend to have a positive influence on the surrounding provinces, which contributes to a high-high clustering distribution pattern. While in the SAR and Robust SAR models, the spatial effect coefficient is higher than in the SARQ model at all quantiles. The estimates of spatial effect in SAR and Robust SAR are rougher and somewhat unreasonable. Since both methods only estimate at the conditional mean of the response variable across values of the predictor variables, while SARQ could estimate at any conditional quantiles of the response variable. Table 7 also provides the AIC values for each quantile. The lower the value of AIC, the better performance of the corresponding model is. Table   8 presents the value of goodness of fit for SAR, Robust SAR and SARQ at the selected quantile.
From both tables, it can be shown that the SARQ models perform best at the high tail ( ≥ 0.75), at the non-central location.  intervals, the SARQ model also has a greater value of R-squared than the others. Moran's scatter plot at the 90 th quantile, as presented in Figure 4, identified no outlier in residual anymore. This result reveals the necessity of using the quantile approach in spatial autoregressive models to handle the outliers and skewed distribution of malnutrition data. These results indicate that the SARQ model outperforms SAR and Robust SAR models in terms of fitting performance and prediction. Thus, the proposed model at the 90 th quantile could be accepted. We also conclude here that the proposed model at 90 th quantile is the best model for malnutrition data. The model of malnutrition in Indonesia is proposed as follows: ̂= 0.306 + 0.109 1 − 0.005 2 + 0.306 3 .

CONCLUSIONS
This study proves that by comparing the ordinary Spatial Autoregressive (SAR) model, Robust SAR, and Spatial Autoregressive Quantile (SARQ) model, SARQ is more resistant to outliers and performs better than others. SAR and Robust SAR only produce the model at an average response value, whereas both methods could not yield an acceptable model in this data analysis. Meanwhile, SARQ could estimate the model not only at the average of the response but also varies at different quantiles of the response. This study found that the proposed model at selected quantiles could be accepted based on the value of MAE, MSE, and R-squared as criteria of goodness of fit in the data analysis. Using SARQ, we could observe how the three measures vary with quantiles; specifically, the MAE and MSE are more petite at higher quantiles, while R-squared is greater. These results show that when the quantile value moves from the low tail to the high tail, the model performance improves, reaches an optimal point at quantile 0.90, and then decreases during the rest of the distribution.
Through the empirical analysis of the malnutrition rate in 30 provinces in Indonesia (4 provinces are excluded from analysis) at different quantiles, we conclude that only the Percentage of health insurance ( 1 ) was found as a significant indicator variable in the malnutrition data, as provided in Table 7.