PERFORMANCE TEST OF NAIVE BAYES AND SVM METHODS ON CLASSIFICATION OF MALNUTRITION STATUS IN CHILDREN

,


INTRODUCTION
Madura is the closest island to Java, but developments in various sectors are very far compared to developments in Java [1].Therefore, the Government needs to pay special attention to the Madura region in developing community welfare, especially in the health sector, namely malnutrition.This is because nutritional problems impact the quality of human resources.Based on the results of primary health research in 2013, the stunting rate had reached 37.2%, with a suboptimal growth rate of 8.9 million Indonesian children [2].This shows that cases of malnutrition are pretty high in Indonesia, especially in Sumenep, Madura.Apart from that, over a long period, there have been many cases of children experiencing malnutrition, which has had an impact on the amount of data piling up.Based on this case, health agencies in Madura experienced difficulties in classifying nutritional data for children with cases of malnutrition.
Therefore, in this research, we designed a mining technique to solve this problem using the Classifier method.In this research, Naive Bayes and SVM methods were applied.In this research, we have chosen these two methods because the Naïve Bayes method has advantages in determining probability values in predicting opportunities using previous experience data [3].So, the Naive Bayes method is very suitable for categorizing malnutrition in children.The Naïve Bayes method is simple and has fast computing time for finding models [4].The SVM method also has several advantages, including producing a level of transition accuracy that depends on the kernel function and parameters used [5].Also, SVM is divided into two types based on characteristics: Linear Support Vector Machine and Non-Linear Support Vector Machine [6].
Based on previous research, these two methods have also been compared with other classification methods, such as research conducted by [7] regarding the comparison of Naïve Bayes and K-Nearest Neighbor (KNN) for the classification of Indonesian articles, where the Naive Bayes method has better performance than KNN.Research [8] about The Naïve Bayes method for predicting the smooth level of terrace rental payments for Micro, Small, and Medium Enterprises has produced an accuracy of 81.81%.Meanwhile, in research [9] regarding comparing the Naïve Bayes and Random Forest (RF) methods for regional language classification, the analysis results in this study produce the Naïve Bayes method as better than RF.Apart from the Naïve Bayes CLASSIFICATION OF MALNUTRITION STATUS IN CHILDREN method, which has high accuracy, the SVM method produces a small error rate, as in research [10] regarding the comparison of SVM and Decision Tree for tourist attraction recommendation systems, where the SVM method has better performance than the Decision Tree method.It has also been proven in research [11] regarding comparing the KNN and SVM methods for air quality classification in Jakarta, and the SVM method is superior to KNN with 100 kernels.
Based on previous references and the problems above, this research tries to apply the Naïve Bayes and SVM methods to find the best and most effective method for classifying malnutrition in children.Apart from that, this system can assist health agencies in registering children with malnutrition so that they can immediately receive treatment by providing vitamins or counseling for the child's family.

PRELIMINARIES
Classification is grouping similar objects and also separating objects that are not the same [12].This classification process is an essential part of an information organization to make it easier to access information.From the definition above, the problem in this research is everything related to data mining in the form of nutritional data from malnutrition in children.The data in this research was obtained from the Kalianget Community Health Center.This malnutrition data consists of 694 datasets obtained in 2016-2022 with six attributes involved, including gender, child's age, birth weight, sbirth height, and current weight, as in Table 1.Then, a data standardization process is carried out, dividing the data into training and testing data.Training data is data used to build a model while testing data is testing a model created with other data to determine the model's accuracy [13].After that, a classification process will be carried out using the Naive Bayes and the SVM methods to compare the two.The method comparison model in this research can be seen in Figure 1.

Data Standardization
Data standardization is initial data processing before classification, where the data is normalized with all  values transformed into Z values [14].In this study, Z-Score was used.The Z-Score value is a measure of data deviation from the average value (), which is measured in standard deviation units () [15].The aim is to standardize the dataset by adjusting the balance of comparison values between data before and after processing.The equation for calculating data standardization can be seen in equation (1).

Classification Process
Carry out the classification process using both Naïve Bayes and SVM.Using Naïve Bayes and SVM, the classification process aims to find precise and accurate results.At this modeling stage, accuracy values from the processed data will be produced.The naïve Bayes method is a data-mining classification method based on Bayes' theorem, used with independent probability and statistical methods [16].Bayes' theorem has equation (2).The main characteristic of Classification with Naive Bayes is a very strong (naïve) estimate of the independence of each event [17].The flow of the Naïve Bayes method in classifying malnutrition can be seen in Figure 2.
where P(H|X) is the probability of hypothesis H based on condition X (posteriori probability), P(X|H) is the Probability of X based on these conditions, P(H) is the probability of hypothesis H (prior probability), and P(X) is the Probability of X.The SVM method is a classification method that finds the best hyperplane that separates two classes in the input space [18].This method uses hypotheses as linear functions in a high-dimensional feature space by implementing learning bias from statistical learning theory.In this research, we chose the SVM method because it has the advantage of determining distances using support vectors, so the computing process is fast.Meanwhile, in high-dimensional space, SVM can search for hyperplanes that maximize the distance (margin) between data classes [19].The equation for determining the hyperplane can be seen in equation (3).To get the best hyperplane, look for a hyperplane in the middle between two class boundary planes or maximize the margin between two sets of objects from different classes, as seen in equation ( 4).After calculating the margin, the course of data to be predicted or testing data can be determined based on the function value in equation ( 3).The kernel function used to map the initial dimensions (lower dimensions) of the data set to new dimensions (relatively higher dimensions) can be seen in equation (5).Carry out calculating the Hessian matrix value using equation ( 6).Then, this process is repeated until the iteration value meets the maximum iteration limit or reaches the max value (|δα_i|) < ε (epsilon).
So, the concept of the SVM method in classifying malnutrition status can be seen in Figure 3.The

Evaluation Testing Stage
At the evaluation testing stage, ratio testing is carried out with a comparison of training data and testing data of 90%:10%, 80%:20%, 70%:30%, 60%:40%, 50%:50%, 40%:60%, 30%:70%, 20%:80%, and 10%:90%, Where this process is intended to determine the quality and accuracy values that the two methods have produced.So, the calculation of accuracy, precision, and recall values is made at this stage using the confusion matrix.Confusion Matrix is visual data containing several cases that have been classified well and correctly and cases that have been mistyped [18].

MAIN RESULTS
One of the reasons for the increase in child mortality is nutritional needs that are not met [20].
Therefore, there is a need for a system that can classify children's nutritional status early through intensified growth monitoring [21].Risk factors for malnutrition consist of indirect and direct causes [22].Indirect causal factors include a lack of quantity and quality of food consumed, which Meanwhile, Marasmix-Kwashiorkor is a combination of these two classes [23].
The statistical model is one of the models that is trusted and very reliable as a support for decision-making [24].The concept of probability is a form of statistical model.In this research, Naïve Bayes and SVM methods were used.However, before the classification process, a preprocessing process is carried out by standardizing the data.The results of data standardization can be seen in Table 2.After the data standardization process, the next process is to carry out classification using Naïve Bayes by determining the probability value of each variable by adding up the frequencies and combinations of values from the data set.In this method, all attributes will contribute to decision-making, with equally important attribute weights and each attribute is independent of each other, where the first step of Naive Bayes is to calculate the mean and standard deviation of each continuously valued variable in each category, such as in Table 3.Then calculate the probability of the nutritional status category for each category itself, as in Table 4.Meanwhile, the classification process with SVM begins with the process of forming a polynomial kernel to represent data in analysis when the data is not separated linearly.The results of kernel calculations for training data can be seen in Table 5.The parameters of the kernel polynomial function consist of C, and Degree (d).In this study, a polynomial kernel function was used with values d=1 and d=2 and parameters C=1, C=5, C=10, C=50, and C=100.Classification accuracy for each polynomial kernel function parameter can be seen in Figure 4.It can be seen in the graph that the best accuracy in training data classification is 87.36% for parameters d=2 and C=5.Meanwhile, the best accuracy data was 89.49% for testing data on parameters d=1 and C=5.
From the accuracy results produced by kernel calculations, it has been shown that the polynomial kernel is the best kernel for the classification of malnutrition.The Y value here is a value in the form of a vector containing the values 1 and -1.This calculation is carried out repeatedly until the maximum iteration state is reached or max (||) <  ().
The results of calculating the Hessian matrix for ten training data can be seen in Table 6.Then the final step in the classification process with SVM is calculating the margin.The results of the margin calculations on the training data can be seen in Table 7.  Measurement of classification performance on original data and data resulting from the classification model is carried out using cross-tabulation (confusion matrix), which serves to analyze whether the classifier is good at recognizing tuples from different classes.Evaluation with the confusion matrix produces accuracy, precision, recall, and f1-score values.Accuracy in classification is the percentage of accuracy of data records that are classified correctly after testing the classification results [25].The process of the classification analysis stage for the Naïve Bayes algorithm with Confusion matrix results can be seen in Table 8.Meanwhile, the analysis of classification with SVM uses the best parameters from SVM, namely parameters C = 5 and d = 2 for training data and d = 1 and C = 5 for testing data.Performance evaluation based on the confusion matrix table between predictions and actuals can be seen in Table 9.Based on table 9, is obtained from comparing these two methods to determine the method with the highest level of accuracy.From the results of the two algorithms, accuracy, precision, recall, and f1-score are obtained as in Table 10.Based on Table 10, the results of the comparison between the Naïve Bayes algorithm and SVM can be concluded with the accuracy graph in Figure 5. Based on Figure 5 shows that the values obtained From the test results by dividing the dataset, the best accuracy results were obtained by the SVM method with Kernel Polynomial and parameter C=5 of 89.76% for the malnutrition classification process in children.

CONCLUSIONS
From the results of research that has been carried out to classify the nutritional status of malnutrition in children by comparing the Naïve Bayes and SVM methods, it can be concluded that this classification process uses dataset division and then measures the performance of the two methods to determine the level of accuracy by evaluating the confusion matrix against several of the best experiments.The SVM method with a polynomial kernel has the highest accuracy value of 89.76%.This can be said to be more accurate when compared to the Naïve Bayes method, with an accuracy rate of 86.31%.Apart from that, for this research to be better in the future, it is necessary to develop the following research with the suggestion to combine attribute selection methods to increase the accuracy value in the classification process and to be able to compare SVM with several kernels.

Figure 1 .
Figure 1.Comparative Design Model of Classification Methods for Malnutrition

Figure 2 .
Figure 2. Naive Bayes Method Flow for Malnutrition Classification

2 |𝑤| ( 4 )Figure 3 .
Figure 3. SVM Method of Classification Malnutrition Model affects the weight-to-height ratio in normal children or decreases, children's linear growth decreases or stops and weight gain decreases.Meanwhile, direct factors such as poverty, low education, food availability, and job opportunities.Therefore, overcoming malnutrition requires cooperation from various related parties.The impact of malnutrition on children is classified into three classes, namely Marasmus, Kwashiorkor, and Marasmix-Kwashiorkor. Marasmus is a severe form of nutrition characterized by symptoms of looking very thin, thin, and dull hair.Kwashiorkor is a form of severe protein malnutrition, resulting in impaired growth and changes in mental status.

Figure 4 .
Figure 4. Graphic of Accuracy Kernel Polynomial with Training and Testing Data

Figure 5 .
Figure 5. Graphic of Accuration to Comparison Results of Naive Bayes and SVM Classifier Methods.

Table 2 .
Data Standardization Results

Table 3 .
Mean and Standard Deviation of Variables in Every Category

Table 4 .
Likelihood of Each Attribute for Each Category in Nutritional Status

Table 6 .
Hessian Matrix Result of Training Data

Table 7 .
Calculation of Weights and Margin Values for Training Data

Table 10 .
Classification Model Performance Measurement Results