Spatial Error Model to Analyze Morbidity Rate in Indonesia

Morbidity is a diseased state or symptom in which a person is unable to do daily activities. The morbidity rates can differ by person and place. This paper aims to investigate factors affecting morbidity rates in each province in Indonesia. Using applied spatial error model (SEM), this study analyzed the spatial effect on factors being investigated. It demonstrated that SEM is well-applied on Indonesian morbidity rates data. The results show that in 2020, provinces on Java Island and some provinces on Sulawesi Island have health problems that need to be addressed because they are in the high and moderately high morbidity categories. The value of the Moran Index is positive, indicating a similarity in the percentage of morbidity rates in adjacent provinces. The morbidity rates in most of provinces in Indonesia are affected by the duration averages of school attendance, health insurance, and population density . The coefficient of determination (R 2 ) of the SEM model of 77.60% can be said to be a fairly good model.


Introduction
The level of public health can be described by morbidity, mortality, and nutritional status (Ministry of Health RI, 2010).According to Tulchinsky and Varavikova (2014), morbidity (measure of disease occurrence) can be measured based on the number of people with disease, period and duration of illness, frequency of deaths, disease, disability, and risk factors related to health outcomes.On the other hand, mortality or death rate is the measure of death frequency over a given period (Nolte & McKee, 2012) while nutritional status refer to the physiological state of the population in reference to nutrient intake (Food Agric Organ, 2007).One of many indicators to assess successful development in public health is the level of morbidity rates in the communities.The lower the morbidity, the healthier the people are.
Morbidity rate is the number of certain disease incidences that is formulated as the number of people suffering from disease per 1000 people exposed to the disease (Kardjati et al., 1985;Singh-Manoux et al., 2008).The morbidity rate is more important to tackle than mortality rate, because high morbidity rate can also trigger high mortality rate.
Morbidity rate can be used to measure the health condition in general, to assess how successful the programs of disease eradication, sanitation of environment, and to understand the people knowledge on health services.
Based on research conducted by health experts (Okoroiwu, 2020;Moise, 2018;Wunsch & Gourbin, 2018;Zylke & Bauchner, 2020;Amini et al., 2021;Chang et al., 2022), morbidity is caused by neonatal respiratory distress syndrome, tuberculosis and diarrhea.In most cases, asthma, tuberculosis and diarrhea diseases have a negative impact on the patient's life, causing children to often miss school, limiting personal and family activities and decreasing work productivity.The higher the morbidity rate, the worse the level of public health.The morbidity rate can reflect the real state of health because it has a close relationship with environmental factors such as poverty, malnutrition, infectious diseases, housing, proper drinking water, environmental hygiene and health services (Kardjati, et.all. 1985;Krieger & Higgins, 2002;Boyles et al., 2021;Jian et al., 2017;Kumari et al., 2023).Based on data from Bangdan Pusat Statistik (BPS), the percentage of Indonesia's morbidity rate from 2017 was 14.31%, decreased to 13.19% in 2018 but experienced an increase again in 2019 to 15.38%.According to Soleman (2020), every five children in preterm birth in neonatal phase carried co-morbid factors that increase morbidity rate.The study of Hussain et al. (2015) found that one-third of the Indonesian adult population have multimorbidity.Aside from women being particularly affected, the study showed high prevalence of multimorbidity among younger individuals.Based on these data, the government must take steps to deal with it so that the morbidity rate does not increase in the following years.
Morbidity rate differ by geographical area and country.It depends on life quality of the people within the area.The factors affecting morbidity rate also differ by geographical area.Accordingly, the method to analyze such a situation should incorporate spatial term in its model.One of the models that can tackle this condition is a spatial regression model.
The spatial effect can show the clustering effects on adjacent areas.By this method, the spatial effects can be seen and be applied to analyze the morbidity rate in Indonesia using dataset obtained in 2020.

Spatial Regression Models
According to Rey et al. (2020), regression and prediction examines how spatial structure analyze data.Through spatial structure, regression models generate explicitly spatial data.If the model systematically mispredicts, a better model can be developed.For example, mapping classification or prediction error can help errors in data clusters.Hence, "regardless of whether or not the true process is explicitly geographic, additional information about the spatial relationships between observations can make predictions better" (Rey et al., 2020).Spatial Dependence Test.According to Anselin (1988), the tests to be used to investigate the spatial dependence in the error term of a model are Moran's I and Lagrange Multiplier (LM).Moran's index is a kind of correlation to investigate the relationship among adjacent observations.Under null hypothesis the observations are assumed to have no spatial correlation (H0: I = 0).The test statistic used is as follows.
() var( ) Accordingly, Moran's index (Moran's I) can be calculated as: Where: The formula for Breusch-Pagan test is: Where  2 is errorof i-th observation and  is a matrix of independent variables of size n x (k Lagrange Multiplier (LM) Test.LM test is applied to determine the existence of spatial effect in the model.The procedures of LM test are as follows.

a) Lagrange Multiplier Lag
The hypotheses are:

b) Lagrange Multiplier Error
The hypotheses are: or p-value < α.

Spatial Regression
Spatial regression is a regression in which it incorporates spatial effect which is represented in the spatial weight matrix based on areal adjacency.

Spatial Model.
Model for spatial regression according to Anselin is as follows: Substituting equation ( 6) to ( 7 Where Wu shows spatial on spatially dependent error(ε).
Parameter estimation is obtained by maximizing the logarithm or using maximum likelihood method.Parameter estimation of SEM as the following equation: -

Parameter Significance of Spatial Regression
Model Fitness Test.To test the fitness of SAR model, the following procedure is employed: Test statistic used is: Where: k : number of predictor variables R 2 : coefficient of determination Decision criterion: reject H0 if Fcalculated F(α,k,n-k-1).
Partial Test.The Hypothesis for partial test is as follows: (Anselin,2003) 0 :0 Test statistic for parameter significance is: .
sb  be asymptotic standard error, and  be the parameter of spatial regression (namely ,

and 
).If Zhitung  Zα/2or p-value< ( =0,05), then the decision is to reject H0, which means the coefficient of regression can be used in the model.

Selecting the Best Model
The criterion of model selection, which is used in this paper, is AIC criterion.The model is selected when it has the least AIC score.As presented by Hu (2007), the formula to calculate the AIC score is as follows: Where: ( | ) Ly  : likelihood function of the parameter being estimated.

Morbidity Rate
Morbidity is the condition of a person who is said to be sick if the health complaints that are felt cause disruption of daily activities, namely not being able to work, take care of the household, and normal activities as usual.According to Hernandez and Kim (2022), morbidity is one of the two commonly measures of epidemiological surveillance which describes the progression of a given health occurrence.Morbidity is usually represented or estimated using prevalence (proportion of population) or incidence (frequency within a population).It can be presented as a ratio or percentage (Mont, 2014).
The following is the morbidity rate formula.Hernandez and Kim (2022) refers incidence to the frequency at which individuals within a specific population develop a given symptom or quality.It could account the number of people with newly developed disease (Choi et al., 2019).Meanwhile, attack rate is the proportion of incident cases of a disease occurring to population exposed to the source of disease.
There are three dimensions that show indicators of morbidity, namely the long and healthy life dimension, the knowledge dimension and the decent life dimension.The longevity and health dimension is measured based on life expectancy and the percentage of population seeking medical treatment at a health worker's practice.The knowledge dimension is measured based on the illiteracy rate and the average years of schooling of the population 15 years and above.While the dimension of a decent life is measured based on the percentage of the population that has access to decent drinking water and the percentage of the poor population.
The Pan American Health Organization (n.d.) enumerates the factors affecting the accuracy of the measurement of morbid events such as data quality, validity of measurement instruments, disease severity, cultural norms, confidentiality and health information system.
It is particularly important that the analysis take into consideration the diversity and volume of data.Since data differ by area of a country, it is imperative that the model used identifies the disaggregation of data.Similarly, the probability of errors in the data analysis must be treated with utmost confidence.

Social Factors
Social factors are factors that are born, grow, and develop in a common life (Salim, 2002).According to Anderson (1995), social factors include education and ethnicity.
Increasing education is an effort to prevent morbidity.In a research conducted by Ardhiyanti ( 2013), the level of education in an area is reflected in the illiteracy rate because it can show the inability of the population in an area to absorb information from various media, communicate orally and in writing.In addition to the illiteracy rate, in 2014 BPS used the average years of schooling as an indicator of education.

Environmental and Behavioral Factors
Morbidity reflects the actual state of health because it has a close relationship with environmental factors such as malnutrition, infectious diseases, housing, healthy drinking water, environmental hygiene, and health services (Wulandari, 2017).According to Pan American Health Organization (n.d.), there are environmental factors that are difficult to measure such as exposure to air pollution, exposure to sunlight and exposure to some disease.
The research was conducted based on the following steps.Factors Affecting Morbidity of East Java Residents with Multivariate Geographically Weighted Regression (MGWR) (Hanum, 2013).
Modeling Factors Affecting Morbidity in Central Java Using Spline Truncated Nonparametric Regression (Rosanti, 2020)

Description of Morbidity Rate
For simplicity in interpreting the spread of morbidity in each province in Indonesia, we do the mapping as seen in Figure 1, where brighter color represents high morbidity rate, while lighter color shows the opposite.

Map of Morbidity Spread in Indonesia 2020
Based on Figure 1, it can be seen that there are 8 provinces to have very high morbidity rates and all these provinces are located in Java.Some 8 other provinces have high morbidity rates.This means that provinces on Java Island and some provinces on Sulawesi Island have health problems that need to be addressed because they are in the high and moderately high morbidity categories.There are 9 provinces of moderate morbidity rates which are distributed over Sumatera, Kalimantan, and Sulawesi islands, and the other 8 provinces with low morbidity rates.

Spatial Effect Tests
In SEM model, there are two spatial effect tests to be stratified, namely spatial dependence and spatial heterogeneity tests.The results of the tests are presented on Tables 1 and 2.

Spatial Error Model (SEM)
In model detection, it was concluded that the appropriate model is SEM.Hence, we are going to use SEM.The following is the result of SEM output based on parameter estimation with level of significance at 5%.Based on the model with some significant variables, the interpretations are as follows: 1.The coefficient for average duration of school attendance is -1.8299 which means if we increase the average duration of school attendance for one year and other variables remain constant, then the morbidity rate in Indonesia will decrease 1.8%.
2. The coefficient for health insurance is -0.0754 which means if this variable increase ten percent, the morbidity rate tends to decrease 0.75%.
3. The coefficient for population density is 0.0004 which means if the population increases 100% then the morbidity rate will also increase as much as 0.04%.Coefficient of determination (R 2 ) of the SEM model is 77.60 % which represents that the model is good enough to explain the morbidity rate in Indonesia.

Conclusion
Based on the Moran's index testing, it can be concluded that there is spatial autocorrelation on morbidity rates, which means similarities among adjacent provinces exist.The variables that affect morbidity rates in Indonesia are average time duration of school attendance (X1), health insurance (X4) and population density (X5), where the SEM model can be written as: ̂= 32,8322 − 1,8299 1 − 0,0754 4 + 0,0004 5 +   ℎ   = 0,6601 ∑     +   34 =1,≠ From the model obtained, it can be seen that the factors significantly affect morbidity cases in Indonesia.The results imply the need for the government to further formulate measures in reducing the percentage of morbidity, especially the factors that have the greatest influence, namely the average length of schooling and health insurance.The results clearly show that strengthening the education of the population greatly reduces morbidity rate.
Taking this in the context, it is a challenge to the government to further improve the quality of education and health insurance in the country.The model suggests a country with healthy mind and healthy body to reduce morbidity rate.Given the limits of the study, further research is recommended considering the number of predictor variables and types of variables to obtain a better model.

Acknowledgement
Authors would like to thank the Ministry of Research and Technology and Halu Oleo University for funding this research project from 2021.

Reject
H0 if the value of calculated Z greater than tabulated Zα/2.Spatial Heterogeneity Test.To test the effect of spatial heterogeneity, it can be done by applying Breusch-Pagantest (BP test).The hypotheses are: H0 : No heterogeneity among areas H1 : Heterogeneity exists among areas of people who experience health complaints and disruption of activities JP : total population According toChoi et al. (2019), morbidity indicators include prevalence, incidence and attack rate.Prevalence measures the proportion of population with specific illness at a specific time period which can be point or period prevalence.Point prevalence measures the number of cases in the population at a point in time while period prevalence estimates the proportion of individuals with a certain condition at any time during a specified time period.


Summarizing descriptive statistics of variables  Analyzing SEM model as follows: o Performing spatial effect test and model identification o Conducting assumption test of regression for the following SEM model o Interpreting model and making conclusion There are several studies that model mordibitas cases, as follows: ) becomes X: matrix for predictor variables (nx(k+1)) β : coefficient vector of regression parameters  : spatial lag coefficient parameter of dependent variable  : spatial lag coefficient parameter on errors u : vector for error terms on Y (n x 1) n : number of observations or locations I : identity matrix (n x n) ε : vector for error term in equation U which has normal distribution with mean zero and variance 2 I  of size n x 1Spatial regression with area approximation consists of some models i.e., Spatial Error Model(SEM).General model for SEM can be seen as follows:

Table 1
The Result of Moran Index TestBased on table 1, the p-value is 0.017459 < α which means 0 H rejection and there is spatial autocorrelation among adjacent provinces.The Moran's index is 0.357063 which shows positive autocorrelation.This means there are similarities of morbidity rates among adjacent locations and tends to grouping.

Table 4
Parameter estimation dan testing of SEM model