Classification of Toddler Nutritional Status Using a Binary Classification Tree With AlgorithmsQuick, Unbiased, Efficent, Statistical Tree

ABSTRACT

1. INTRODUCTION  Currently, Indonesia is still facing nutritional problems that have a serious impact on the quality of human resources (Rahayu & et al, 2018).One of the problems in nutrition is stunting or malnutrition in toddlers which is still a major obstacle in the population system (Ayu, 2021).According to the results of the 2022 Indonesian Nutrition Status Survey (SSGI), the stunting rate has fallen from 24.4% in 2021 to 21.6% in 2022.However, hard work is still needed to reach the 14% target.(SSGI Ministry of Health RI., 2023 For this reason, the determination of nutritional status is very important in helping to monitor the state of nutritional health growth in toddlers at any time.Nutritional status can be a measure of success in fulfilling nutrition for children as indicated by the child's weight and height.Nutritional status is also defined as health status.(Adzani, 2021).
Classification which is part of data mining can make decisions on the nutritional status of toddlers faster and more efficiently.QUEST method(Quick, Unbiased, Efficient, Statistical Trees) is one of the statistical methods that can be used to form a decision tree and classify an object using a separator algorithm that produces a binary tree.
The advantage of the QUEST Algorithm is that it has speed in computing(quick), produce unbiased independent variables(unbiased), and efficient for complex data, that is, it can use independent variables of categorical and numeric types(efficient).(Maharani & et al., 2017).So it is hoped that this research can help the process of classifying data easily computationally, and can provide information that is analyzed statistically.Therefore, a Binary classification tree with the QUEST Algorithm is used to classify the nutritional status of toddlers.

RESEARCH METHOD 2.1 Data source
The source of the data used in making binary classification trees with the QUEST algorithm is the identity format data for toddlers in 2022 in Sukasari Village, Pegajahan sub-district, obtained from the KB Counseling Center, sub-district.Pegajahan.

Research variable
The variables in this study consist of predictor variables and response variables.Response variable() in this study was the nutritional status of toddlers, which in this study were grouped into two categories namely, status 0 = A is normal nutritional status and status 1 = B is nutritional status at risk of stunting.Predictor variable() in this study is the toddler identity format, namely gender ( ! ) in the form of categorical data with 0 denoting female gender and 1 denoting male gender, current body weight( " ) in the form of continuous data, current body height( # ) in the form of continuous data, health insurance( $ ) in the form of categorical data.The results of numerical p-value variables and categorical variable p-values were compared.If the selected variable is categorical, then the variable is converted into a numeric variable by transforming it into an I-dimensional dummy vector.But if the selected variable is continuous, then the quadratic discriminant analysis steps are carried out as follows: 1.For example, ̅ % and  % " is the mean and the variance  * from the observation of the first category response variable, meanwhile ̅ ! and  !" is the mean and the variance  * from the second category of observations.For example, () = is the probability of each category in the response variable, with  ( is the number of observations at the initial node of the response group . 2. Quadratic discriminant analysis blocks the  into three intervals viz(−∞,  ! ), ( !,  " ) and( !, ∞) Where are the roots of the equation Log both sides to obtain a quadratic equation  " +  +  = 0,
3. A knot is split  * = , Where defined as follows: If  = 0, for For  ≠ 0, Which is closer ) , provided it produces two non-empty nodes, QUEST only use one of the two roots of the equation, namely the root whose value is closest to the sample average of each class (Handayani & DKK, 2013).

Termination of Node Blocking
Tree The blocking process is carried out until the nodes cannot be partitioned anymore with the rules for terminating the tree formation process that are determined, including the following: a.If a node becomes pure, that is, all objects/cases belong to the same class of bound variables at that node, then the node will not be partitioned.b.If all objects/cases in a node have identical values for each predictor variable, then the node will not be partitioned.c.If the tree depth at that time has reached the specified maximum point, then the tree insulation process is stopped.d.If the partition of a node results in a child node whose node size is less than the specified minimum child node size value, the node will not be partitioned.(Handayani & DKK, 2013)

Classification Tree Accuracy
The correct classification of the observations in binary logistic regression must determine the valuecutpoint (c) and compared with the estimated probabilities ().If () greater or equal to c then the estimated value is included in the response  = 1 and besides  = 0.The correct classification consists of specificity and sensitivity.Specificity and sensitivity can be calculated using table 2.1 below: Specificity or classification accuracy in predicting the event that the response does not have the expected criteria, namely on ( = 0).as big4 @ ?7@ 5 .100%, to evaluate the accuracy of the classification in predicting the event that the response has the expected criteria ie( = 1) or also called sensitivity which is worth 4 > >7; 5 .100%, while the accuracy of classification, namely the accuracy of classification in predicting events accurately can be predicted by a model whose value 4 >7@ >7;7?7@ 5 .100%.(Rizki & Setyawan, 2018).

Selection of Insulating Variables
The determination of the insulating variable begins by finding the smallest p-value variable.For categorical data, the chi square test was carried out, and for numerical data, the ANOVA F test was carried out.Then with the help of SPSS the results of the Anova test and chi square test, can be seen in Table 3.1 below. " = 0,210 0,647  #  = 53,247 0,000  $  = 5,166 0,026 The smallest p-value is on the variable # , height.Then when used = 0.05 and ! is the number of independent variables, obtained is the partitioning variable or initial node.Furthermore, because the selected ones are numerical variables, quadratic discriminant analysis is used to determine the insulating nodes.

Classification Tree
By using the QUEST method, the classification tree is obtained as follows: Nutritional   In summary, the overall value of the classification accuracy of the trees formed is 95.7%.Thus, the probability of misclassification of the tree is 4.3%, which means that this classification tree is optimal.

CONCLUSION
Classification which is part of data mining can make decisions on the nutritional status of toddlers faster and more efficiently.The QUEST (Quick, Unbiased, Efficient, Statistical Trees) method is a statistical method that can be used to form decision trees and classify an object using a separator algorithm that produces a binary tree.
From the results of the classification there is a variable height ( # ) as the limiting variable.in the early stages of insulation, the parent node which consists of 70 toddler data.Variables are partitioned based on height into two nodes, namely node (1) and node (2).Node (1) is a node containing 25 children under five with a height of more than 85.79 cm, while node (2) is a node for 45 children under five with a height less than or equal to 85.79 cm. in the next process, the blocking is terminated.The overall value of the accuracy of the classification of trees formed is 95.7%.Thus, the probability of misclassification of the tree is 4.3%, which means that this classification tree is optimal.
of the normal category classification can be calculated as a valuesensitivity, as follows :The percentage of accuracy in the classification of stunting risk categories can be calculated as a valuespecificity, as follows :Specificity = $# $D × 100 = 95,5 %The accuracy of the overall classification accuracy can be calculated as follows: APER = "$7$# G% × 100 = 95,7 %

Table 3 .
1 Chi Square Test Results and ANOVA Classification accuracy consists of sensitivity, specificity and accuracy of classification accuracy.Classification accuracy can be calculated based on Table3.2.