GEOGRAPHICALLY WEIGHTED NEGATIVE BIVARIATE BINOMIAL REGRESSION FOR MODELLING THE NUMBER OF DENGUE DISEASES AND THEIR MORTALITY IN EAST NUSA TENGGARA, INDONESIA

GEOGRAPHICALLY WEIGHTED NEGATIVE BIVARIATE BINOMIAL REGRESSION FOR MODELLING THE NUMBER OF DENGUE DISEASES AND THEIR MORTALITY IN EAST NUSA TENGGARA, INDONESIA ALAN PRAHUTAMA, ARIEF RACHMAN HAKIM, DWI ISPRIYANTI, HASBI YASIN, BUDI WARSITO Statistics Department, Faculty of Science and Mathematics, Diponegoro University, Semarang Indonesia Environment Science, Graduate School Diponegoro University, Semarang Indonesia Copyright © 2021 the author(s). This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


INTRODUCTION
Dengue disease is a kind of tropical disease that is caused by the Aedes aegepty mosquito.
Its spread depends on various factors such as weather, climate, the environment of household, density of population, and health facilities. The government has made many attempts to handle it, such as socializing to keep a clean and healthy environment and increasing the number of health workers in both urban and remote areas. In 2016, the case of dengue was 204,171 cases, with the number of their mortality was 1,598 cases. Meanwhile, in 2017 it decreased significantly which the number of dengue cases was 68,407 cases, with the number of its demise was 493 cases [1].
The region which the highest contributed dengue cases was Java Island., In 2020 the highest dengue cases were in Lampung province, with 3,423 cases and 11 deaths. While, East Nusa Tenggara (NTT), province in Indonesia, had 2,711 cases with 32 death cases (the highest mortality case of dengue in 2020) [2]. Whereas, in 2017, East Nusa Tenggara province had 210 cases of dengue with one death case [1]. It had increased significantly in 2020, so it should be analyzed about the spread of dengue in NTT. Its reach also has dependency among the regions. One method to analyze the spread of disease based on dependency region was spatial analysis. It is one of the statistics branch to model and map the space of variables in the areas [3].
One method in spatial which for modeling between independent variables and dependent variables in regions is GWR. It is also used to map the spread among the regions. GWR results in a regression model in each region, so the heterogeneity of areas can be handled. Global regression was used for modeling spatial data that can results in heterogeneity in its model. So the estimation would be a misperception. Estimate spatial data uses GWR results parameters with different probability of significance in each region [4]. In GWR estimation of the parameter used weighted value which is determined by bandwidth. The Kernel function determines bandwidth. Chosen optimal bandwidth uses Cross-Validation (CV) [4].
Some applications of the GWR method were analysis spread of the average number of smear-positive Tuberculosis (TB) di Xinjiang, China. It used socioeconomic factors, consisting of five variables: population density, the proportion of minorities, the number of infectious disease network reporting agencies, the proportion of the agricultural population, and per capita gross 7973 MODELLING THE NUMBER OF DENGUE DISEASES annual domestic product. It resulted that there was a relation between the average number of smearpositive TB and socioeconomic factors. Besides, the GWR model could better be exploited geographically of the average number of smear-positive TB [5]. Similarly, the spread of dengue in Taiwan has been researched by the GWR model with the count of population and population density as predictor variables [6]. Some of the research about spread dengue used the GWR method can be seen in Table 1. In that analysis, the spatial method, especially GWR, could model the dengue spread and its factors. And also, it was able to map the spread based on regions.
On the other hand, the GWR method has been developed in various ways. One of them is GWPR, which is for modeling the count data follows Poisson distribution [7]. Some research about the GWPR model, such as modeled spread of HIV/AIDS, is based on socioeconomic factors in Sub-Saharan Africa. It resulted that, GWPR model obtained 0,6139 of R-square value [8]. In addition, the usage of the GWPR model to model the spread of Covid-19 in Hong Kong was based on the built environment. The GWPR model obtained a close and spatially heterogeneous relationship between the factors and the risk of COVID-19 transmission. The study provided valuable insights that support policymakers in responding to the COVID-19 pandemic and future epidemics [9].
As well as Poisson regression, the GWPR model requires the variance equal as the mean value or model doesn't contain over or under dispersion. Another method to handle overdispersion in a geographical context was GWNBR [10]. Researches about the GWNBR model, among others, predicted the wildfire occurrence in the Great Xing'an Mountains. The model fitted and forewarned of the GWNBR model were better than the NB model, produced more precise and stable model parameter estimation, yielded a more realistic spatial distribution of model predictions, and detected the impact hotpots of these predictor variables [11]. Others, analysis of factors that influence maternal mortality in Central Java Province. It resulted that clean and healthy household behavior and the number of community health centers have a significant impact on maternal mortality. And also, there were three classifications based on GWBN results [12]. Some of NB model has been developed into multivariate analysis which some response variables, were BNB [13]- [15]. NB applied for modeling insurance claim counts in Singapore based on tri-variate responses. The trivariate NB model was proposed to accommodate the dependency among three types of claims: the third party bodily injury, own damage, and thirdparty property damage [16]. BNB model has been applied in many areas such as modelling the 7974 PRAHUTAMA, HAKIM, ISPRIYANTI, YASIN, WARSITO number of hospital stays and non-physician hospital outpatient visits in America. Sixteen variables, including socio-economic variables and insurance and health status variables, were used as independent variables [13]. BNB model has been used in modelling third-party liability claims and comprehensive cover claim which the data from 14,000 automobile policies from a major insurance company in Spain. It resulted that the BNBR model adequately captures the relationship between the two claim counts and the set of explanatory variables [17]. Application BNB model in health aspect has been done in modeling HIV and AIDS in Indonesia. It obtained that the estimation of BNB model was appropiate to model the HIV and AIDS which R-square was 64.9% [18]. The BNB model also developed into spatial analysis which was the GWBNB model [19]. It applied to model the infant mortality in East, Central and West Java, that resulted GWBNB model performed better than BNB model. Some of researches about dengue can be seen in Table 1. According to table 1, the analysis of dengue fever has been developed in various spatial methods, especially in the GWR model. Some of researches of dengue uses modelling and spatial analysis such as GWR. Most of resulted of them were spatial model able to capture heterogeneity among the regions. Applied GWBNB, to model the spread dengue, has not ever done before.
GWBNB model able to overcome the heterogeneity which is caused by spatial data. This paper proposed the GWBNB model to analyze the spread of dengue in NTT. Its novelty is to apply the GWBNB model for modeling dengue in NTT which was the number of dengue cases and their mortality cases based on population density, environment, and health facilities. Besides, GWBNB can model the dengue spread and know the significant factor of stretch. It is also helpful for mapping spread into some regions.

NB model
NB model can be used to overcome overdispersion, which is based on the mixed model of Poissongamma. Negative binomial regression follows negative binomial distribution [26]: For = 0,1,2, … . , which is a random variable, and and are the parameters of the distribution. If = 0 then negative binomial distribution has variance [ ] = . So that the NB model can be written as: Based on Eq (2), the estimation of BNB model can be written as: parameters. GWNBR model is as follows [27]: Where is an independent variable, is an independent variable, ( , ) shows location, shows coefficient parameter, is dispersion parameter.

BNB model
The bivariate negative binomial distribution developed the BNB model. Given Y1 and Y2 as the random variables which have a negative binomial distribution. Furthermore, they were modeled by regression with independent variables 1 , 2 , … , , then BNB model as follows [15]: Eq. (5) follows the NB distribution in Eq. (1), which Y1 and Y2 are dependent variables, 1 and 2 are estimation model of independent variables, 1 and 2 are parameter of BNB distribution, and Ψ is dispersion parameter. Rely on eq. (5) and Eq. (3),the BNB model can be written as: For i is observation, j follows independent variable, and p is the number of independent variables

Estimation parameter of BNB model uses Maximum Likelihood Estimator (MLE) with
Newton-Raphson iteration. The likelihood function of the BNB model is as follows: The testing of BNB model uses Maximum Likelihood Estimator Test (MLRT) with hypothesis as follows: So the GWBNB model can be written as: The statistics testing for GWBNB model is:

Methods
The area of research was NTT province which consists of 22 regions. The data was collected from Statistics Bureau NTT-Indonesia and data from the ministry of health Indonesia in 2018.
Dependent variables were the number of dengue case ( 1 ) and the number of their mortality ( 2 ) in each region. The independent variables consider the health facilities, population density, and also environmental aspects. Among others, population density ( 1 ),percentage of poverty ( 2 ), the number of doctors ( 3 ), the number of health facilities ( 4 ), percentage of the area which has good sanitation ( 5 ). The steps of analysis are as follows: 1 Checked the correlation between two dependent variables. It is needed before modeling two independent variables, because the model is simultaneous and related each other. If the two independent variables were not correlation significantly, they can't modeling simultaneous.
2 Checked multicollinearity between independent variables. Multicollinearity has impacted bias to the model which high correlation but less of significant variables [29]. If the independent variables contain multicollinearity, it should be excluded or be handled by other methods such principal component analysis. Fix kernel consider the distance with regular distances, while adaptive kernel is fit to disperse distances [4].There are two kinds function of Kernel such as bisquare and Gaussian. Because the number of observation less than 20, we decide to use bisquare function, so as a weighted we use Adaptive Bisquare Kernel function. Based on optimal bandwidth, we find parameter estimate of GWBNB model. 6 Then, interpreting the GWBNB model with make cluster based on estimation result of GWBNB model. In this research we build three cluster among other small, medium, and large.
It used to map the spread of the number of dengue and their mortality.

RESULTS AND DISCUSSIONS
To begin with the analysis results, we describe the data which consist of the number of dengue and their mortality with its factors in East Nusa Tenggara, Indonesia. It is first analysis to know the distribution of data which it is used to check the data. Table 2 Table 2, the variance of Y1 is more remarkable than Y2, and it can be seen from the minimum and maximum values on each variable. And also, population density has high variance. The highest population density was in Kupang city, which capital city of NTT province. proven that a high case of dengue in certain areas has an increased risk of mortality. Checking significance of the correlation is needed before modeling the bivariate, which is hypothesis testing as follows: H0: no correlation between the number of dengue cases (Y1) and their mortality (Y2) H1: the number of dengue cases (Y1) and their mortality (Y2) were correlated The t-value was 7.3162, and ttable was 2.086, so H0 was rejected. There was a correlation between dengue cases (Y1) and their mortality (Y2). The next step checked the multi-colinearity among the predictor variables. It can be seen in Table 3 which the value of Variance inflation Factors was less than 5, so they didn't have multi-colinearity. The following step checked the overdispersion of the Poisson regression model. It can be seen in Table 2 that variance is more remarkable than mean both Y1 and Y2. It indicated that there was an over-dispersion of Y1 and Y2. Besides, checked of Pearson Chi-square value divided by degree of freedom resulted is greater than 1 (

Modeling the number of dengue cases (Y1) and their mortality (Y2) use the BNB model
Previously, either the number of dengue or their morality has overdispersion of model. For that reason, we need other model to handle the overdispersion, one of them is NB model. Because there are two dependents variables which have correlation each other, we modeling into BNB model. Firstly, for modeling the number of dengue and their mortality is estimate the model. Table   5 shows the estimation parameter of the BNB model of the number of dengue cases and their mortality in East Nusa Tenggara. Based on table 5, the number of dengue cases was impacted by the number of doctors significantly in 10% ( 3 ) and the number of health facilities in 5% ( 4 ).
Meanwhile, the number of dengue mortality cases were affected by the percentage of poverty ( 2 ), the number of health facilities ( 4 ), and the percentage of areas with good sanitation ( 5 ).  For testing of BNB model as follows: The value of (̂) was 83.4472, which is less than ( On the other hand, the number of dengue mortality can be interpreted as if each variable had one unit, and then the dengue mortality estimation was exp(-1.48990)=0.225 units. If the percentage of poverty, the number of health facilities, and the percentage of areas with good sanitation increased by two units, then the estimated number of dengue mortalitywas 0.239 units.

3.2.Heterogeneity spatial testing
Based on Table 2 about statistics descriptive of the data, the value of variance in each variable shows high value. It means that, in each observation of data indicates heterogeneity of the data. The testing of heterogeneity uses Glesjer testing, while the observation of data contains location or space, so the heterogeneity testing is spatial heterogeneity testing . The testing is as follows: Hypothesis H0 : 1 2 = 2 2 = ⋯ = 2 = 2 (It hasn't heterogeneity spatial) H1: at least there is one 2 ≠ 2 ; i = 1,2,...,n (It has heterogeneity spatial) The statistic of testing is testing is rejected H0 for G > ; 2 . Based on the result, it got the value of G, which was 28.56709.
For significant level 5%, the value of was (0,05;10) 2 = 18.31, so H0 was rejected. It can be concluded that there was heterogeneity spatial between the number of dengue and their mortality.
So it would be better to model with the spatial method. This article proposed the GWBNB model to estimate bivariate, which included dengue and mortality cases. In this model has Akaiake Information Criteria (AIC) value, which was 364.4725.

3.3.Modelling the number of dengue cases (Y1) and their mortality (Y2) use the GWBNB model
BNB model with uses spatial weighted develops GWBNB model. The first step counts the Euclidean distance among areas. The next step counts the optimal bandwidth using kernel function, based on CV value. The optimum bandwidth of the GWBNB model for modelling the number of dengue and their mortality cases can be seen in Table 6. Both distance and bandwidth are used to determine the weighted spatial. Table 6 shows the bandwidth of the GWBNB model in Y1 and Y2, which was used for both of them. The value of bandwidth is based on each location which has 7983 MODELLING THE NUMBER OF DENGUE DISEASES various values. The bandwidth is used to determine the weighted which involved kernel function.
The function kernel that was used in this research was Adaptive Bisquare Kernel. area which has good sanitation. And also the number of mortality dengue is less equal than the number of dengue in each region. Table 8 shows the number of dengue and their mortality cases estimation.
The number of dengue mortality cases in some regions includes Alor, Belu, Lembata, Rote Ndao, Sabu Raijua, South Timor Tengah, and North Timor Tengah, were an invalid model. There was a rule that the number of dengue mortality cases must equal or less than the number of dengue cases.
So the percentage of valid model mortality dengue cases was compared to dengue cases which were 73%. The predicting map of dengue's number and their mortality cases used the GWBNB model can be seen in Figure 1. Based on Figure 1, the number of the cluster was 4 group, which cluster 1 was 1-10; Cluster 2 was 12-23 (11 and 12 excluded, because they were unavailable in the result), cluster 3 was 24-30, and cluster 4 was 31-40 cases. The members of cluster 2 were neighbored each other, i.e., east Manggarai, Ngada, and Nagekeo. And also, in cluster 3, the neighbourhood was South Timor Tengah, North Timor Tengah, and Malaka. The GWBNB model produced AIC's value was 234.932 which it was less than BNB model. So that, GWBNB model resulted better than BNB for modeling the number of dengue cases and their mortality. Furthermore, this results are supported same result by [12] and [11] that analysis of geographically (in this case GWNB model) more effective than global regression (NB model).

CONCLUSION
Based on the analysis, it was known that the factors that impacted the number

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests. Appendix 1. Table 9