SPLINE AND KERNEL MIXED ESTIMATORS IN MULTIVARIABLE NONPARAMETRIC REGRESSION FOR DENGUE HEMORRHAGIC FEVER MODEL

: This article discusses statistical innovations implemented in the health sector. The research is being conducted on the treatment and prevention of Dengue Hemorrhagic Fever (DHF), focusing on the factors contributing to the increase in DHF. Create a nonparametric regression model with a mixed estimator, truncated spline


INTRODUCTION
The relationship between the response and predictor variable, whose purpose is unknown, can be identified statistically using nonparametric regression [1], [2]. Nonparametric regression is not rigid in defining the regression function [3], does not require certain assumptions like linear regression, where the error must be normally distributed, and does not force the regression curve to be linear. This is in contrast to parametric regression, which causes the regression curve to follow a specific model, such as a linear model. The observational data determine the advantage of nonparametric regression, the regression curve without being forced to adjust a specific function [4], [5]. Nonparametric regression assumes that the data derive their form of estimation from the regression curve without regard for the researcher's subjectivity [6]. As a result, the nonparametric regression model approach is both adaptable and objective [2], [3].
The estimator approach used in nonparametric regression includes truncated spline and Kernel.
Truncated spline is polynomial pieces that have segmented and continuous properties [7], [8]. One of the advantages of the truncated spline is that this model tends to find its own estimate of the data wherever the data pattern moves [9], [10]. This advantage occurs because, in the truncated spline, knot points indicate changes in data behavior patterns [4], [5]. While the Kernel estimator has the advantage that it is flexible [11], the mathematical form is easy and can achieve a relatively fast level of convergence [12]. The Kernel approach depends on bandwidth, which controls the smoothness of the estimation curve [5], [12]- [14]. Estimating the regression curve with the Kernel estimator approach adjusts from the value of the smoothing parameter  .
According to Budiantara et al. [15], the nonparametric and semiparametric regression models developed by the researchers so far, if explored more deeply, basically there are very heavy and basic assumptions in the model. Each predictor in the multi-predictor nonparametric regression is 3 DENGUE HEMORRHAGIC FEVER MODEL considered to have the same pattern, so the researchers force the use of only one form of the model estimator for all predictor variables. Therefore, using only one form of the estimator in various forms of different data relationship patterns will certainly result in the resulting estimator not being compatible with the data pattern. As a result, the estimation of the regression model is not good and produces a significant error. Therefore, to overcome this problem, several researchers have developed a nonparametric mixed regression curve estimator in which an appropriate curve estimator approximates each data pattern in the nonparametric regression model. There are several studies that have developed and reviewed mixed estimator models, including [1], [16]- [19].
Dengue Hemorrhagic Fever is one of the problems in Indonesia's health sector. DHF is caused by the bite of the Aedes Aegypti mosquito [20], which usually attacks tropical and subtropical areas of the world [21], [22], one of which is Indonesia. Based on data from the World Health Organization (WHO), Indonesia has the 2nd rank with the most significant DHF cases among 30 endemic areas (Ministry of Health, 2018). In 2020, there were 108303 patients with DHF cases, and 747 died. Meanwhile, the number of DHF cases in 2021 was 73518, and 705 died. The number decreased by 32.12% compared to the previous year, but this case still needs special attention.
Based on the description that has been explained, so the purpose of this study is to conduct a study of the nonparametric regression Mixed Estimator of Truncated Spline and Gaussian Kernel (MTs-GK) model in the additive multi-predictor nonparametric model and the implementation of the model in the case study of Dengue Hemorrhagic Fever (DHF) with a special issue of the factors that influence the increase in DHF.

A. Mixed Estimators of Truncated Spline and Gaussian Kernel
A mixed estimator is a multi-predictor nonparametric regression model that uses two or more types of estimators to approximate the regression curve [15], [16]. Budiantara et al. [15] were the first to develop a mixed truncated spline and Kernel nonparametric regression model. A Mixed estimator is a model approach in nonparametric regression where more than one estimator is used [18], [23], [24]. The form of the regression curve for each relationship between the predictor and the response variable will be approximated by two or more estimators according to the characteristics of the relationship [25].
ii xv  is the regression curve, with assumed to be unknown, smooth, and follows an additive model so that ( , ) ii xv  we can write in the form in Equation (2).
Based on Equation (2), the regression curve () i mx will be estimated with a truncated spline estimator, while the regression curve () i hv with a Kernel estimator.
The truncated spline estimator is a segmented polynomial model [26], [27]. For example, given ( ) The truncated function is: As a result, the regression curve estimation using the truncated spline estimator can be written in Kernel estimator has a good ability to model data that does not have a certain pattern [11], [29], 5 DENGUE HEMORRHAGIC FEVER MODEL [30]. For example, given paired data, ( ) , ii vy where is the relationship between the predictor ( ) i v and with response variable ( ) i y following the Kernel nonparametric regression model.
The regression curve () i hv will be estimated with the Kernel estimator in Equation 7.
K is an abbreviation for Kernel Function. The Gaussian Kernel function is used in this research: Based on equations (7) and (8), we can write them in matrix form as: Furthermore, based on Equation (2) and the form of each estimator in Equations (5) and (9), we can write: Vector ε has a size ( 1) n  , so that based on Equation (10), then: Equation (12) can be summarized as: In Equation (5), the regression curve estimation is written using the truncated spline estimator is ( )() xK = mX β , then based on Equation (12), we can write: A brief summary of Equation (14) is: According to Equation (15) and the shape of the estimator for each component in Equation (9) and (15), the mixed estimator of truncated spline and Gaussian Kernel will be obtained as follows: ) ( , ) x v In this study, the method used to select the optimal knot point and bandwidth is Unbiased Risk (UBR) [4], [17], [18] with the formula in Equation (17

A. Data Sources
The research used secondary data from Wahab Syahrani General Hospital (AWS Hospital) in Samarinda. The variables of this study are described in Table 2.  4. Select the optimal knot point and bandwidth based on the minimum UBR value with the Formula in Equation (17). Each predictor variable in this study has the same number of knot points (1 to 3). The bandwidth values tested are in the interval of 0.05 to 5. 5. Determine the best model of the mixed estimator truncated spline and Kernel based on the minimum UBR value and then calculate the Coefficient of Determination (R 2 ) value. 6. Simultaneous hypothesis testing for the best model based on ANOVA in Table 1.

MAIN RESULTS
In this section, we will explain the results of the study mixed estimator truncated spline and Gaussian Kernel applied to data on the platelet count of DHF patients.

A. Scatter Plot
The first step in the modeling process using a mixed estimator is creating a scatter diagram for each variable. The scatter diagram for each predictor variable to the response variable is shown in Figure 1. Based on Figure 1, it can be determined which type of estimator will be used for each predictor variable. A more detailed summary of the results of determining the estimator is presented in Table   3.

Variable Notation Description Estimator
Predictor 1 The best model's coefficient of determination (R 2 ) is 88.46%. Based on the results of simultaneous hypothesis testing, it can be concluded that simultaneously there is at least one significant predictor variable in the model.