BAYESIAN CONDITIONAL NEGATIVE BINOMIAL AUTOREGRESSIVE MODEL: A CASE STUDY OF STUNTING ON JAVA ISLAND IN 2021

,


INTRODUCTION
Poisson regression analysis is used to examine the relationship between a dependent variable representing count data that follows a Poisson distribution.In Poisson regression analysis, there is an assumption that must be met, which is equidispersion.In real-world cases, count data often exhibit variability that is greater than its mean (overdispersion) or variability that is less than its mean (underdispersion) [1].To handle overdispersion in Poisson regression, a negative binomial regression is employed, which can be analyzed using the Generalized Linear Model (GLM) method.Negative binomial regression assumes that observations are independent.In reality, geographic, sociocultural, and economic conditions will vary between different regions.This implies the presence of spatial effects among regions.A common approach to addressing this is to use Generalized Linear Mixed Models (GLMM), which incorporate structured random spatial effects and are often modeled using conditional autoregressive methods.
The conditional autoregressive (CAR) model is a spatially conditional model where observations at one location are influenced by other locations [2].CAR operates on spatial autocorrelation by adding spatially correlated random components.Spatial correlation in the CAR model is generated through neighborhood weighting [3].The CAR-Besag and Bessag York Mollie (CAR-BYM) model is used for negative binomial observations [4].Bayesian spatial models, such as the Besag, York, and Mollié Poisson models (BYM), have been implemented for mapping and identifying areas with a high risk of tuberculosis (TB) cases in Portugal [5].In the study by Djuraidah et.al.[6], the Bessag York Mollie (BYM) model was found to be the best in predicting TB on the island of Java.BAYESIAN CONDITIONAL NEGATIVE BINOMIAL AUTOREGRESSIVE MODEL In CAR model estimation, the Bayesian method is commonly employed.A method that can provide a better and more efficient posterior distribution using Bayesian is the Integrated Nested Laplace Approximation (INLA) introduced by Rue et.al.[7].INLA focuses on marginal inference using Laplace and numerical integration approaches.Bivand et.al.[8] employed Bayesian inference with the INLA approach, which offers faster computation compared to Markov Chain Monte Carlo (MCMC) and can accommodate large datasets.
The negative binomial conditional autoregressive (CAR) model with the integrated nested laplace approximation (INLA) approach was applied to analyze the prevalence of stunting cases, involving count data, and is thus considered discrete.Stunting is a significant issue faced by Indonesia and has serious implications for human resource quality (HRQ).According to the 2021 Indonesia Nutritional Status Survey (SSGBI), the prevalence of stunting in Indonesia is 24.4%.
This means that approximately one out of four children under five years old in Indonesia is experiencing stunting.This figure is high compared to the threshold set by the World Health Organization (WHO), which is 20%.The Sustainable Development Goals (SDGs) include stunting as one of its worldwide objectives.With the right policies, the government aims to reduce stunting in children by 40% by 2025 and eradicate all forms of malnutrition by 2030.To achieve this, the prevalence of stunting in Indonesian children has to decrease from 37.2% in 2013 to 14.9% by 2025, which requires an average annual reduction of 2.7%.Therefore, mapping stunting cases and the underlying risk factors is crucial in Indonesia, especially on the island of Java, where approximately 56.10% of Indonesia's population lives.Apart from that, infrastructure in Java, particularly in terms of clean water and sanitation, is better than any other island in Indonesia.This has been reflected in the relatively low prevalence of stunting among children in Java's provinces [9].However, the local government still has to take stunting into account.While its prevalence may not be high, due to its large population, there will probably be a sizable number of cases of stunting to deal with.
Therefore, this research aims to analyze cases of stunting and determine the influencing factors.Moreover, it is aimed at mapping stunting cases on the island of Java using the best-fitting models from the Generalised Linear Model (GLM), Generalised Linear Mixed Models (GLMM), Intrinsic Conditional Autoregressive (ICAR), and Conditional Autoregressive Bessag York Mollie (CAR BYM) with a negative binomial distribution.

Poisson Regression
Poisson regression is one of the generalized linear models (GLMs) used to simulate the relationship between a response variable following a Poisson distribution and predictor variables.
The Poisson distribution is employed to handle data that represents the count of events in a random process with equal mean and variance [10].The general equation for GLM with a Poisson distribution is as follows: In the equation above, where   is the i-th response variable following a Poisson distribution,   is the mean of the Poisson distribution,    is the matrix of predictors,  is the parameter vector to be estimated, and  (. ) to bridge the Poisson distribution.When the variance of the response variable (  ) is greater than the mean (  ), overdispersion occurs in GLM with a Poisson distribution [10].

Overdispersion Test
Overdispersion can be detected partly by using the Deviance (D) value.The deviance is a likelihood ratio test that compares the current model to its saturated model, divided by its degrees of freedom, and can be expressed as follows [11] : is the value of the response variable, and î is the estimated result from the Poisson BAYESIAN CONDITIONAL NEGATIVE BINOMIAL AUTOREGRESSIVE MODEL distribution.The value of  is calculated as  minus , where  is the number of observations, and  is the number of parameters.If the dispersion test statistic is greater than one, then overdispersion is present in the response variable [10].To address the issue of overdispersion, several techniques can be used, such as employing more flexible distributions like the negative binomial distribution.

Negative Binomial Regression
GLM (Generalized Linear Model) with a negative binomial distribution is a statistical method used to mimic the relationship between a response variable following a negative binomial distribution and predictor variables.The general equation for GLM with a negative binomial distribution is as follows [12] : with log(.)functions as the link of the negative binomial distribution.

Negative Binomial Conditional Autoregressive Model
Generalized Linear Mixed Models (GLMM) are an extension of GLM (Generalized Linear Models) with the addition of random components in the predictor part of the model.The general equation for GLMM can be formulated by modifying the GLM model and adding random effects.
represents independent random effects (unstructured random effect).Another development in GLMM is the consideration of structured random spatial effects in the predictors through the Conditional Autoregressive (CAR) prior, hence often referred to as the Conditional Autoregressive (CAR) model.
If the response variable  = ( 1 , … ,   ) is a univariate vector representing the count of cases at location  originating from a negative binomial distribution, the predictor vector at location  is    = (1,  1 ,  2 , … ,   ) with  = 1,2,3, … ,  and there are  independent variables.The structured spatial random effect components are represented by  = ( 1 ,  2 , … ,   ).The CAR model can be formulated as follows: (  ) =     +   (6) The random effect   is modeled using the class of conditional autoregressive (CAR) prior distributions.Various CAR prior distributions have been introduced by various experts, especially in the context of disease mapping.These include the Intrinsic CAR (ICAR), Besag York Mollié (BYM), Stern-Cressie, and Leroux priors, among others.However, this study will only discuss the ICAR and BYM priors.

ICAR
Currently, the simplest CAR prior is the Intrinsic Autoregressive (IAR) prior, which was proposed by Besag et.al.[13] and has a full conditional distribution, namely: The conditional expectation of   is equal to the average of the random effects in neighboring areas, while the conditional variance is inversely proportional to the number of neighbors.Thus, the variance structure of   will have a strong spatial correlation when an area has more neighbors.The variance parameter  2 is used to control the magnitude of variation among random effects, typically following an inverse gamma distribution process.

BYM
The Besag-York-Mollie (BYM) model was first described by Besag et.al.[13] and incorporates random effects from two components, namely, combining structured random spatial effects with independent random effects.The model takes the following form: The random effects  = ( 1 ,  2 , . . .,   ) are independent with a mean of zero and constant variance, while the spatial autocorrelation random effects are modeled through .This is the ubiquitous CAR model in practice.

Integrated Nested Laplace Approximation (INLA)
The Integrated Nested Laplace Approximation (INLA) method developed by Rue et.al.[7], is more effective than MCMC simulations since it uses direct numerical computations on the latent Gaussian Markov random fields (GMRF) density [14].
For each component of the parameter vector, the INLA technique aims to estimate the marginal posterior distribution.The marginal posterior distribution for the parameter, according to Blangiardo and Cameletti [15], is as follows: Then, for the hyperparameter , the marginal posterior distribution is collected as follows: Once the marginal posterior distributions are obtained, the calculation process for (|) and (  |, ) through the Laplace approximation, as described by Blangiardo and Cameletti [15], is as follows: where  ̃(|) is an approximation of (|).

Data
The 2021 Health Profile of all the provinces on the island of Java, specifically Banten, DKI Jakarta, West Java, Yogyakarta Special Region, Central Java, and East Java, provided the data for this study.The 119 districts and cities that make up the entire island of Java served as the study's observational units.

Analysis Method
The method used to analyze stunting data is the Negative Binomial Conditional Autoregressive (CAR) method with the INLA approach.Data processing was performed using R Studio software.The data analysis steps conducted as follows: 1. Conducting data exploration for the research a.For queen contiguity, the weighting is done with the following formula: For exponential weight, the weighting is done using the following formula:   =  (−  ) , where   is the distance between area  and area .
c.For inverse distance weight, the weighting is done as :   =   −1 , where   is the distance between area  and area .
d.For KNN weighting : i. Calculate the distance from the center of unit  to all other units  ≠ .
iv.For each , the weight matrix  had elements   equal to 1 if area  was adjacent to area , while the main diagonal elements would always be zero.

Conducting spatial autocorrelation tests by calculating the Moran's index value on the response variable [20], with the hypothesis below :
0 ∶ There is no spatial autocorrelation in the data where: The testing criterion:  0 is rejected if the || value is greater than  /2 6. Conducting estimation of the Negative Binomial CAR model using the INLA approach, 6.1 Creating four models to be examined [21]

𝑁
where   is the response data at location ,  ̂ is the prediction result at location , and  represents the number of observation locations.The smaller the MAD value, the smaller the prediction error will be.
8. Utilizing the goodness-of-fit criteria to determine the optimal model's relative risk.The relative risk was determined by exponentiating the structured random effect component   from the best model using the following formula [15] : Interpreting the research results and drawing conclusions

Data Exploration
The distribution of stunting cases on Java Island varied significantly, with a range of 1-368 cases per 100.000population and an average of 56 cases per 100.000population.The areas with the lowest stunting cases were in Blitar and Mojokerto in East Java Province, with 4 and 1 case per 100.000population, respectively.Meanwhile, Bogor, Garut, and Bandung in West Java Province had the highest stunting cases, with 368, 292, and 227 cases per 100.000population, respectively.stunting cases were located in East Java Province, while the areas with the highest stunting cases were mostly in West Java Province.The spatial reliance in the statistics on cases of stunting on Java Island is indicated by this color grouping.
Reducing the stunting rate can involve identifying factors that influence it.One of these factors is nutritional intake.To measure the intake, data on the percentage of infants with exclusive breastfeeding and the percentage of infants with complete basic immunization are used.Poor maternal factors and inadequate childcare practices, especially in terms of proper sanitation within the family, can also contribute to child stunting.To assess good childcare practices, data on the percentage of families with access to proper sanitation is implemented.Stunting is not only influenced by factors directly related to health but also by socioeconomic issues such as poverty levels [19].The testing of overdispersion was done by examining the Pearson chi-squared value and the deviance divided by its degrees of freedom.Table 3 shows that the deviation divided by degrees of freedom is 41.101 and the Pearson chi-squared value divided by degrees of freedom is 57.668.
Since both of these numbers are more than 1, the data is overdispersed.The negative binomial regression model is one approach to dealing with overdispersion in Poisson regression.
Consequently, the data was analyzed using the negative binomial regression model.The proportion of spatial variability in the CAR BYM model can be calculated by dividing the variance of the spatial structured component by the total variance of the random components.
In this case, the proportion was 99.66%, indicating that the proportion of variability in the structured spatial component (  ) is larger than the unstructured spatial component (  ).

Mapping Relative Risk Using The CAR BYM Model
The spatial patterns of diseases are visualized by disease mapping, which also identifies regions with a relative risk greater or lower than one.Figure 3 shows that 31.1% of the regions had a high relative risk, including 11.76% in Central Java, 9.24% in West Java, 7.56% in East Java, 1.68% in Banten, and 0.84% in DKI Jakarta provinces.In Figure 1, it is shown that Cianjur and Bandung Barat districts had 78 and 106 cases per 100,000 population, respectively, which are classified as low and moderate areas.However, in Figure 3, Bandung Barat and Cianjur districts had high relative risk values of 1.32 and 1.79, respectively.This is because of their proximity to Bogor, Sukabumi, Bandung, and Garut districts, which are classified as high-risk areas.Meanwhile, districts and cities in the Special Region of Yogyakarta had low-risk areas.

2 . 3 .
Utilizing the Variance Inflation Factor (VIF) values to identify multicollinearity among explanatory factors Checking for overdispersion by examining the deviance and Pearson chi-squared values divided by their degrees of freedom.If these values are greater than one, it indicates overdispersion in the data, and more flexible distribution like the negative binomial distribution can be used.4. Calculating the Spatial Weight Matrix (), with the details of each weighting performed [6] :

Figure 1
Figure 1 Number of Stunting Cases per 100.000 on Java Island in 2021The distribution map of stunting cases in Figure1, shows that the areas with the lowest

Figure 2 Figure 2
Figure2shows that there are still areas with low percentages of infants with exclusive breastfeeding (below 50%), including Blitar, Bangkalan, and Sumenep Regencies in East Java Province.Areas with low percentages of infants with complete basic immunization, which all were below 20%, included Tasikmalaya Regency in West Java Province, as well as Pandeglang Regency and South Tangerang City in Banten Province.Regions with low percentages (below 30%) of proper sanitation were Serang and Tangerang Regencies in Banten Province.Areas with a poverty rate above 19% were Bangkalan, Sumenep, and Sumpang Regencies in East Java Province.

Figure 3
Figure 3 Relative Risk Map of Stunting Cases on Java Island 2021 Table 1 lists the variables used in this investigation.BAYESIAN CONDITIONAL NEGATIVE BINOMIAL AUTOREGRESSIVE MODEL

Table 3
Overdispersion Testing Results