ABILITY OF ORDINAL SPLINE LOGISTIC REGRESSION MODEL IN THE CLASSIFICATION OF NUTRITIONAL STATUS DATA

: In this study, an ordinal spline logistic regression model was developed and used to classify data on the nutritional status of children under five in the Gowa district, Indonesia. The nutritional status of toddlers consists of 3 categories: malnutrition, good nutrition, and excess nutrition. So nutritional status data for toddlers can be modeled by ordinal spline logistic regression. The results of this study indicate that the data on the nutritional status of children is optimal in the ordinal spline logistic regression model using 2-knot points with a GCV value of 0.2158. The estimation results of the ordinal spline logistic regression model show that toddlers aged 18 months and 24 months tend to have a good chance of getting good nutrition. In comparison, toddlers aged 18 to 24 months tend to have a minimal chance of getting good nutrition, and the accuracy of the classification model of the nutritional status of toddlers uses the ordinal spline logistic regression of 92.25%.


INTRODUCTION
Ordinal logistic regression is a form of logistic regression used for response variables of more than two categories [1], has an order [2], and is assumed to have a multinomial distribution [3]. Logistic regression has been developed using robust [4], principal components [5], and mixed models [6].
In addition, its use in data has also been widely used in various fields of science, including the use of logistic regression on health data [7], education [8], and socioeconomic [9]. In some cases, we often find unbalanced data, so the regular use of logistic regression is less accurate [10]. This is because classification tends to eliminate opportunities from minority classes. For this reason, several logistic nonparametric regressions have been developed, including spline binary logistic regression [11] and local polynomial logistic regression [12]. This development is in line with several well-known estimators in regression, including truncated spline [13], smoothing spline [14], penalized spline [15], and local polynomials [16].
In this study, we used a truncated spline estimator that involved knots in the estimation. Truncated splines are used because they can handle data that has changed behavior at certain intervals and tend to look for data estimates wherever the data pattern moves with the help of knot points [13].
The knot point is where the pattern of changes in functional behavior occurs at different intervals based on the minimum GCV value [17]. Research on health data found two patterns of changes in children's weight [11]. The study used spline binary logistic regression and obtained an accuracy of 87.5%. However, this research has not considered the three categories of response variables with more than two categories and has an order. Therefore, researchers will develop an ordinal spline logistic regression model and then apply the method to the nutritional status data of toddlers in the Gowa district, Indonesia.
The nutritional status of toddlers in Indonesia can be measured by indicators of age, weight, and body length [18]. However, in this study, we used age as a predictor variable. There have been many studies on the nutritional status of toddlers by considering several factors, including breastfeeding [19], nutritional intake [20], formula feeding [21], and mother's knowledge [22].
However, the study did not show the probability level that it could occur in the age interval. 3 ABILITY OF ORDINAL SPLINE LOGISTIC REGRESSION MODEL Therefore, we will analyze the nutritional status data, which consists of three categories: undernutrition, good nutrition, and overnutrition. We will use ordinal spline logistic regression with several knot points to see patterns of changes that might occur.

PRELIMINARIES
In this study, we used secondary data from the Gowa District Health Office, which consisted of 17600 toddlers who had weighed each Posyandu in the Gowa District, Indonesia. The data consists of 3 categories of response variables: malnutrition, good nutrition, and excess nutrition. Toddler nutritional status was analyzed with the age predictor variable.
If the response variable yi is in the form of three-level categories, then the regression model used is an ordinal logistic regression model. The model is assumed to have a multinomial distribution and is independent between observations with the probability density function as follows: Furthermore, the model used in this study is an ordinal logistic nonparametric regression model with a truncated spline estimator, which can be stated as follows: Parameter estimation is done by decomposing it using a logit transformation as follows: The function ( − ℎ ) + It is a truncated polynomial that is described as follows: If, in the above equation, we substitute the value q = 1,2,3, we get a spline function called a linear Next, it is necessary to transform the likelihood function into the natural logarithmic form so that the ln-likelihood function is obtained as follows: The value of the cumulative probability function for each response category is as follows: For the first category: For the second category:  We derive each parameter to find the maximum value of the ln-likelihood function, and then the derivative equals zero. The result of the derivative is a nonlinear function, so it is necessary to use a numerical method to obtain the parameter estimation, one of which is Newton-Raphson iterations.
The estimation results of ordinal spline truncated logistic regression parameters are as follows: A suitable method for selecting optimal knot points is the Generalized Cross Validation (GCV) method. The GCV method can be written in the following equation: , is the knot point, matrix is ( ) −1 , is the identity matrices. The minimum GCV value gives the optimal knot point value.

Data on the nutritional status of children under five were obtained from the Gowa District Health
Office, Indonesia; as many as 17600 children under five had malnutrition, good nutrition, and excess nutrition. The results of the nutritional status data plot for toddlers are shown in Figure 1.
The percentage for the malnutrition category was 2102 toddlers or 11.94%, for the excess nutrition category were 1766 toddlers or 10.03%, and the rest for good nutrition was 13732 toddlers or 78.03%. This percentage shows a significant difference in numbers between the categories of good nutrition with less and more nutrition. Therefore, in this study, we modeled nutritional status data based on the age factor using ordinal logistic regression with a truncated spline estimator. The knot point is chosen at the predictor variable interval, namely age, so the optimal knot point needs to be chosen to get the optimal model. For 1 knot point, the value obtained using the ordinal spline logistic regression model for each linear, quadratic, and cubic order is as shown in Table 1.
For several knot points in Table 1, it can be seen that the 9-knot point in the linear order gives a minimum GCV value of 0.2159. Then it will be compared with the GCV value using 2-knot points in each linear, quadratic, and cubic order. GCV values using the ordinal spline logistic regression model with 2-knot points, as shown in Table 2.   Table 2 Table 3 below:  Table 3 shows that age has a significant effect at a level of 5%, and the model is as