THE USE OF THE BINARY SPLINE LOGISTIC REGRESSION MODEL ON THE NUTRITIONAL STATUS DATA OF CHILDREN

,


INTRODUCTION
Logistic regression is a regression developed for categorical responses. One form of logistic regression used for two-categorical responses is binary logistic regression. The response variable of the binary logistic regression model is assumed to follow the Bernoulli distribution [1]. Logistic regression developments include binary logistic regression for mixed effects [2], binary lasso logistic regression for those experiencing multicollinearity data [3], and weighted binary logistic regression [4]. Its use in data has also been widely used in various fields of science, including the use of the concept of logistic regression in medical data [5], cable television user survey data [6], and data on the credit risk of consumer loans in banking institutions [7].
In some cases, we often find that a lot of data is not balanced between different classes so the usual use of logistic regression is less accurate. This is because the classification tends to eliminate opportunities from the minority class because the predicted value will tend to be in the majority class. Therefore, in the next development, researchers have made a logistic regression model using a nonparametric regression function estimator. There are several estimators in nonparametric regression, including truncated spline [8], spline smoothing [9], penalized spline [10]- [12], local polynomial [13], kernel [14], and the Fourier series [15]. In this study, we use a truncated spline estimator involving knot points in the estimation criteria. The knot point is the point where the pattern of change occurs so that in a model there can be several segmentations on continuous data that are selected based on the minimum GCV value [16]. This is one of the advantages of the truncated spline so it is widely used by researchers in their research. For example, health data found 4 patterns of changes in blood sugar in diabetic patients based on the time of hospitalization [17] and carbohydrate diet [18]. The study showed that some segmentation of the pattern of changes that could be explained by truncated splines led to more accurate identification of problems in a health case. Therefore, data on the nutritional status of children under five were The nutritional status of children under five is measured by several indicators, and one of them is the weight factor. Nutritional status can be divided into two categories, namely normal and abnormal so that the data can be analyzed using binary logistic regression. There have been many studies on the nutritional status of children under five by considering many factors, including maternal education and income [19], number of family dependents [20], and body weight [21], all of which affect the nutritional status of children under five. However, the study did not show how the probability level of nutritional status could occur in certain weight intervals. Therefore, in this article, we analyze the data on the nutritional status of these toddlers using a spline logistic regression model at several optimal knot points. Furthermore, this article is divided into 4 parts, namely the second part describes the data material and analysis methods. The third section describes the results and discussion related to the nutritional status model based on body weight through binary spline logistic regression. The last section contains the conclusions of our article.

PRELIMINARIES
Data on the nutritional status of children under five were obtained from the Community Health Center which in Indonesia is abbreviated as Puskesmas. The secondary data came from the Puskesmas in Barru Regency, Indonesia, with a total sample of 432. The data consisted of the nutritional status of children under five in response to two categories. The abnormal category was coded 0 and the normal was coded 1. The nutritional status of the child was analyzed with the predictor variable being the toddler's weight.
If the response variable yi is in the form of two categories, then the regression model used is the binary logistic regression model. The model assumes that the data are Bernoulli distributed and independent between observations with the probability distribution function as follows: where ( ) is the probability of success. If = 1, then ( ) = ( ) and if = 0 then Furthermore, the model used in this study is a binary logistic nonparametric regression model with a truncated spline estimator. For example, the order of the spline is expressed as q and the knot point τ is m, then the spline logistic regression model can be expressed as follows [22]: Through the logit transformation, Equation (2) can be expressed in the form: where is the th predictor variable, 0 is the intercept, is the coefficient of nonparametric spline truncated logistic regression, is the knot point with = 1,2, ..., m, and ( − ) + is a truncated polynomial function which can be expressed as following: Parameter estimation in the spline truncated binary logistic regression model in Equation (2), is done using the maximum likelihood method, which is maximizing the likelihood function. The probability density function is known as in Equation (1)  Furthermore, it is made into the form of ln likelihood so that we get: In the next step, the ln likelihood function is derived from the beta parameter until an implicit parameter estimation result is found so that the Newton-Raphson iteration process is carried out.
The estimation results of binary logistic regression parameters with a truncated spline estimator 5 THE USE OF THE BINARY SPLINE LOGISTIC REGRESSION can be expressed as follows.

Data
Data analysis begins with the use of 1 knot point so that the spline linear binary logistic regression model in Equation (3) changes to: ( ) = exp( 0 + 1 + 2 ( − 1 ) + ) 1 + exp( 0 + 1 + 2 ( − 1 ) + ) The knot point is chosen along the predictor variable so it is necessary to choose the optimal knot point for the data. For 1 knot point, the GCV values obtained in the binary spline logistic regression model are shown in Table 1. The optimal knot point is obtained in the model that gives the minimum GCV value as shown in Table 1. For some knot points, it can be seen that the knot point of 4.7 which gives the minimum GCV value is 0.109. Next, we will compare it with the GCV value on the use of 2-knot points. The GCV value in the linear spline binary logistic regression model with 2-knot points is shown in Table 2.  tend to fall into the category of abnormal nutrition. If it is related to the age of toddlers who weigh around 4.7 kg, it is found that they are infants aged 1-12 months. This means that the chance of normal nutrition is greater when the baby is up to 1 year old. These results are in line with research by Onis and Branca (2016) that normal nutrition is seen in children aged one to two years, after which there is a tendency to experience slowed growth [23]. Furthermore, the results of the classification of the nutritional status of children under five through a spline linear logistic regression model with a 1-knot point are shown in Table 3. The results show an 87.5% level of accuracy in classifying the model, meaning that the linear spline logistic regression model with a 1-knot point is accurate in classifying data. And of course, we can also say that the linear spline logistic regression model is accurate in modeling the nutritional status data of toddlers based on the weight factor.