A STUDY OF MACHINE LEARNING ALGORITHMS TO MEASURE THE FEATURE IMPORTANCE IN CLASS-IMBALANCE DATA OF FOOD INSECURITY CASES IN INDONESIA

: The development of various machine learning algorithms on supervised models has become one of the issues in selecting a suitable algorithm. The black box of machine learning requires a technique that can be used to interpret the feature importance using the SHAP in order to obtain predictors. The class-imbalance problem in real cases is another challenge in improving the performance of minority class predictions. This study uses a food insecurity dataset, one of the SDG's important indicators to study to achieve zero hunger. The machine learning algorithms studied consisted of Random Forest, XGBoost, SVM, and NN. Meanwhile, the study of the effect of class-imbalance used three treatments: without handling, SMOTE-N, and ADASYN-N. Twelve models are built based on a combination of four algorithms and three treatments to study the performance models and their feature importance. The SMOTE-N and ADASYN-N were able to increase the sensitivity value up to 0.48 units higher when compared to without handling data. The agreement level on without handling data has a low value, indicated by the 0.736 ICC value, while on SMOTE-N and ADASYN-N, it is higher, indicated by the 0.925 and 0.919 ICC values, respectively. This study dataset is more suitable for using SMOTE-N. It is based on the higher ICC and superior AUC performance. The relatively high ICC value indicates that the use of machine learning algorithms does not influence the agreement level on the feature importance score. Therefore, the choice of a machine learning algorithm can refer to a measure of its performance. Random Forest produced the best performance (AUC and sensitivity). Therefore, the Random Forest SMOTE-N is the best model in this study. It produces food insecurity household characteristics with household conditions having poor water, a small house size, low household head education, few/no savers, and cement or tile flooring.


INTRODUCTION
Machine learning can speed up one of the analytical processes with the help of algorithms built on the model. One of its types is the supervised machine learning technique, which generates a function that maps input to the desired output, and helps produce predictive models with excellent model accuracy [1]. Its ability to capture nonlinear patterns can provide additional insight that generally fails to capture the classical linear model approach [2]. There are various machine learning algorithms, such as Random Forest (RF), Support Vector Machine (SVM), XG-Boost (XGB), Neural Network (NN), and other algorithms. Each of them has a different algorithm and has advantages and disadvantages in the accuracy and interpretation of the model. An essential issue in supervised machine learning techniques is that interpretation is not straightforward because the model formed is a black-box. The feature importance approach is an attempt to interpret the black-box model. Several feature importance techniques include feature importance permutation, Local Interpretable Model-Agnostic Explanations (LIME), Shapley Additive Explanations (SHAP), information value, information gain, and various other techniques. 3

A STUDY OF MACHINE LEARNING ALGORITHMS IN CLASS-IMBALANCE DATA
The SHAP method can turn black-box into white-box that can be interpreted and understood [3].
This method is proven to produce a consistent score of feature importance in modeling results across various datasets and is best in interpretation [4]. This interpretation will provide additional benefits in determining policies by the government, one of which is food insecurity, which is also a global concern.
Food insecurity is a condition that occurs when a person does not have protected access to safe and nutritious food in sufficient quantities for growth and development and active and healthy life. Food insecurity is one of the leading causes of poor nutritional status [5]. The government has compiled the 2030 Agenda for Sustainable Development Goals (SDGs), consisting of 17 goals.
The second goal is to encourage governments to end hunger and ensure access to safe, nutritious, and sufficient food throughout the year. Monitoring these targets uses two indicators: the prevalence of insufficient food consumption (Prevalence of Undernourishment/PoU) and the majority of the population with moderate or severe food insecurity based on the Food Insecurity Experience Scale (FIES).
Statistical data shows that food insecurity is still a fundamental problem in Indonesia. The BPS-Statistics Indonesia released in 2019 that 8.47 percent of the population had a calorie intake below 1,400 kcal/day. The International Food Policy Research Institute (IFPRI) released the Global Hunger Index; in 2020, Indonesia was only ranked 65th out of 113 countries (not including high-income countries). Based on the National Socio-Economic Survey (SUSENAS) results, in 2020, BPS-Statistics Indonesia predicts that 7.66 percent of households experience insufficient food consumption and 5.32 percent of families experience moderate or severe food insecurity.
Based on the percentage of households experiencing FIES food insecurity (moderate and severe, not including mild), there is a class-imbalance in the proportion of food insecurity and not food insecurity household classes. Building a supervised machine learning model on classimbalance data presents a unique challenge to the model to be made. Class-imbalance data refers to a classification problem where the number of observations per class is not evenly distributed [6]. The technique of handling class-imbalance data can be done by generating synthetic data, some of these superior techniques include Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN). SMOTE is a class-imbalance handling technique that does not use oversampling principles. However, it modifies the distribution of data between majority and minority classes on the dataset to balance the quantity of data for each class [7]. SMOTE can increase the size of the Area under the Curve of ROC (Receiver Operating Characteristic) as a performance measure in machine learning [8]. ADASYN improves learning in two ways: reducing bias caused by class-imbalance and shifting the boundaries of classification decisions toward data difficulties in an adaptive manner [9]. The SMOTE technique influences the order of the feature importance in the selection of models and the handling of the class-imbalance problem [10].
Several studies of food insecurity have been conducted, among others, by [11] identified factors that determine food insecurity in 134 countries in 2014, concluding that the features that affect food insecurity are low levels of education of household head and lack of social prosperity.
The study suggests adding a factor of acceptance of cash transfers. In Indonesia, it was carried out by [12], [13], and [14] using SUSENAS data. [12] concluded that the higher education of household head would increase food security. [13] found a relationship between food security of rural farmer households with access to credit, rice assistance for the poor, and unconditional cash transfers. [14] study of the household factors that characterize food insecurity concludes that the main elements are recipients of social protection programs, education level, and recipients of the poor.
Based on this background, it is necessary to have the correct model among the many machine learning algorithms that can capture the issue of class-imbalance data to produce the best accuracy model and provide a reasonable interpretation of the model using SHAP. The objectives of this research are 1) to examine the effect of choosing machine learning algorithms on the SHAP feature importance; 2) to examine the effect of handling class-imbalance on measuring the score of the SHAP feature importance; and 3) to study the interpretation of the feature importance score on the results of the best algorithm on food insecurity cases in Indonesia.

Food Insecurity Experience Scale (FIES)
In 2013 FAO launched the Voices of Hungry (VOH) project to develop a methodology for measuring the severity of food insecurity, namely FIES. The FIES or food insecurity experience scale measures the severity of food insecurity at the household or individual level, whose value depends on yes/no answers to eight questions regarding respondents' access to adequate food. FIES captures experiences related to access to food due to lack of money or other income over 12 months, regardless of the frequency of occurrence. [5]

Random Forest (RF)
Random forest is a classification algorithm comprising a combination of independent classification trees. The classification prediction is obtained from the classification trees formed through a majority voting process (the highest number). Random forests develop the ensemble tree method developed by [15] and improve classification accuracy. The randomization process in random forests to create a classification tree is carried out on the sample data and the taking of predictor features. This process will produce a collection of classification trees of different sizes and shapes.
A small correlation will reduce the prediction error of Random Forests [15].

Xtreme Gradient Boosting (XGB)
XG-Boost or eXtreme Gradient Boosting is a tree-based algorithm [16]. Boosting is an ensemble method with the primary objective of reducing bias and variance. The goal is to create weak trees sequentially so that each new tree focuses on the previous one's weakness (misclassified data). The construction of the next tree will depend on the last tree. The first tree in XG-Boost will be weak in classifying with probability initialization determined by the researcher. Then weight updates will be carried out on each tree built to produce a robust group of classification trees. The last prediction is obtained by taking the weighted sum of all the predictions of the decision tree. The basic algorithm of XG-Boost is as follows [16]: where ( ,̂( ) ) is a loss function to measure prediction error and Ω( ) is used to control the complexity of the model.

Neural Network (NN)
The creation of a Neural Network (NN) is based on a complex learning system in the brain, consisting of closely related sets of neurons. The NN algorithm's advantages include that it does not require many assumptions, makes an excellent non-linear model, and provides a model that approximates the existing system. NN consists of three layers that are the input layer, hidden layer, and output layer. This architecture is also known as Multi-Layer Perceptron (MLP). Each input is linked to every node in the hidden layer, and each output layer has a bias and weight. The activation function to calculate weight and bias describes the relationship between inputs to output values that can be linear or non-linear. [17]

Support Vector Machine (SVM)
Vapnik first presented the Support Vector Machine (SVM) in 1992. SVM was developed with the principle of a linear classifier. For non-linear data, SVM was developed by incorporating the kernel concept. So there is a guarantee that SVM classification will produce very accurate mapping [18].
The SVM concept seeks to find the optimal hyperplane in the input space. The hyperplane function becomes the separator of the two classes in the input space. The line with the maximum hyperplane margin becomes the best dividing line. Margin is the distance between the hyperplane and the closest pattern in each class. The most comparative pattern is called a support vector. The best hyperplanes are those between the two classes.

Synthetic Minority Oversampling Technique Nominal (SMOTE-N)
SMOTE generates data from minor classes with a neighboring approach. Metric (VDM) proposed by [19]. VDM considers the overlapping feature values of all feature vectors and defines the distance between the feature values that is appropriate for the created 7 A STUDY OF MACHINE LEARNING ALGORITHMS IN CLASS-IMBALANCE DATA feature vector. The distance between the two corresponding feature values is formulated as follows [19]: In Equation (2), 1 and 2 are the two corresponding feature values. 1 is the total number of occurrences of the feature value 1 and 1 is the number of occurrences of the feature value 1 for class i. The same convention applies to 2 and 2 . k is a constant with a value of 1. Equation (2) calculates the value difference matrix for each specific nominal feature in the feature vector and gives certain geometric distances, finite set values.

Adaptive Synthetic Nominal (ADASYN-N)
ADASYN was first proposed by [20]. ADASYN reproduces the training data until the proportion of each class is balanced by using the distribution weights for the data in the minority class based on the level of learning difficulty. ADASYN-N is a development of ADASYN proposed by [21] with a data approach with nominal types. Nearest neighbors in ADASYN-N are calculated using a modified version of the Value Difference Metric (VDM) as in SMOTE-N proposed by [8]. VDM looks at the overlapping feature values of all feature vectors. The matrix defines the distance between the corresponding feature values for the created feature vector.

Shapley Additive Explanations (SHAP)
SHAP is a method used by [4] to explain individual predictions based on Shapley's game scores.
The purpose of SHAP is to calculate each feature's contribution to predictions to explain each individual's predictions. Shapley's value is described in equation 3) below [4]: where z is the coalition vector whose elements are 1 (if the feature is included) or 0 (if the feature is not included), ∅ is the Shapley Value which is the contribution of the j-th feature to the coalition. M is the size of the coalition. The value of g(x) is calculated for all observations, so the size of the Feature Importance (FI) in equation 5) is the sum of the values of all observations [4]:

Classification Model Evaluation
Prediction results from a classification model are expected to classify all data correctly, but it cannot be denied that the performance of a model can work accurately. The performance measure of the classification algorithm is measured through a confusion matrix, as shown in Table 1, which is a cross-tabulation between the response feature data included in the prediction class and the actual [22]. Operating Characteristic) will be used as a measure of model performance.

Intraclass Correlation Coefficient (ICC)
Test of agreement is widely used to assess the relationship between outcomes. The degree of agreement between measurements refers to concordance between two (or more) measurements.
Statistical methods are used to decide whether one technique for measuring features can replace Intra-class correlation coefficient (ICC) is one method to assess the fit between continuous feature measurements. ICC reliably reflects the degree of correlation and agreement between numerical or continuous measurements [23]. ICC is used to assess reliability between two or more measurements. ICC is the ratio between the variance between groups and the total variance. The total variance came from three sources: 1) subject, 2) measurement, and 3) residual error. If the measurement variation is assumed to be random, then the ICC formula follows [22]: where variance ( 2 ) is a variation measure, subscript s = subject, o = measurement, and e = residual error. The ICC score ranges from 0 to 1. The closer to 1, the higher the agreement will be, and the number 0 indicates disagreement. The poor agreement is indicated by a score of less than 0.5; a value between 0.5 and 0.75 indicates moderate agreement, a value between 0.75 and 0.9 indicates good agreement, and very good agreement is indicated by a score greater than 0.90.

Data Sources
This study uses SUSENAS 2020 (March) West Java Province data. These data covers 24679 sample households. The level of food insecurity is a target feature in this study consisting of Y=0 (Not Food Insecurity) and Y=1 (Food Insecurity). The predictor features (characteristics) of food insecurity used in this study refer to the results of previous household food insecurity studies. The food insecurity features used are listed in Table 2. 2) Data exploration and presenting the prevalence of food insecurity.
3) Split the data into 70% training data and 30% testing data. Balance the training data with the SMOTE-N and the ADASYN-N technique. The imbalance of the data class on the response feature (y) is an issue that will be discussed in this study. So after dividing the data into two parts, namely the training data (to form the model) and the testing data (for model evaluation), the data class-imbalance is handled. This study uses three treatments of class-imbalance problem, namely without handling data and two synthetic data, which are made using the SMOTE-N and the ADASYN-N technique.

XGB algorithm
Perform hyperparameter tuning of the following parameters: eta (learning rate is the step size reduction used in updates to prevent over fitting); min_child_weight (minimum number of instance weight hessian required on child); subsample (ratio of subsamples of training instances); and colsample_bytree (column subsampling ratio when constructing each tree, subsampling occurs once for each tree constructed).

NN algorithm
Perform hyperparameter tuning of the following parameters: hidden_layer_sizes (number of layers and number of nodes); activation (activation function for hidden layer); solver (for weight optimization), alpha, and learning rate.

SVM algorithm
Perform hyperparameter tuning of the following parameters: C (regularization parameter, regularization strength is inversely proportional to C); Kernel (determines the type of kernel to be used in the algorithm including linear, poly, rbf, sigmoid, and pre computed); and Gamma.

7)
Calculate the SHAP Feature Importance as a score of the features importance in each classification model. 8) Calculation "Measures of Agreement" to assess the agreement (assess agreement) of the feature importance between the classification machine learning algorithms on without handling, SMOTE-N, and ADASYN-N using the Intra-class Correlation Coefficient (ICC).

Data Exploration
Based on the March 2020 SUSENAS data in West Java Province, which consisted of 25091 households, there were 322 households not included in the research dataset. Because eight types of FIES questions were answered that the household did not know (code 8) or refused the answer (code 9), so only 24679 households were analyzed in this study. The number of households in the food insecurity category in this study was 5351 households, or only 21.60 percent. Based on this value, it is clear that there is a class-imbalance problem. Table 3 shows the number of food and not food insecurity categories by data type. The number of food insecurity in the testing and training data are 1587 and 3764 households. The percentage of food insecurity categories is 21.60 percent. The formation of SMOTE-N and ADASYN-N data uses a sampling strategy=1, which means that data synthetic for the minority class (food insecurity) is the same as the amount of data for the majority class (not food insecurity). Optimization with a Tree-structured Parzen Estimator (BO-TPE) to produce optimal parameters used in each algorithm and class-imbalance handling techniques, as shown in Table 4. The parameters obtained in SMOTE-N and ADASYN-N data are more similar to those obtained without handling data. After obtaining the optimal hyperparameters, the model is evaluated on testing data to produce several measures of model performance, as shown in Figure 1. The sensitivity value of the algorithm on without handling data tends to be low, which means that the model does not produce

Level of Features Importance using SHAP Feature Importance (SHAP FI)
This study's feature importance level uses the SHAP Feature Importance (SHAP FI) value approach. Figure 2  FI on without handling data tends to be different compared to SMOTE-N and ADASYN-N, which tend to be similar.

Figure 2.
Heatmap of features importance scores from machine learning modeling algorithms on data with three treatments for class-imbalance problems Several features of the SHAP FI score generated from machine learning algorithms without handling data are outside the top ten rankings. For example, the wall types feature is at the lowest rank are 2, 2, 6, and 7 are generated by the XGB, RF, NN, and SVM algorithms, respectively, but are ranked 11th in the SHAP FI order. This is supported by the heatmap of high-value FI SHAP values (which tend to be blue), which are in low ratings, including the grantee of health insurance national program, roof types, and main income from the transferee.
In the top ten rankings, five features are above the top ten SHAP FI ordered, namely the wall types in all model algorithms, followed by the grantee of health insurance national program in the XGB, SVM, and NN algorithms, the roof types feature on the SVM and NN algorithms, the feature of main income from the transferee on the XGB, RF, and NN algorithms, and grantee of scholarship social program on the NN algorithm. Meanwhile, at the top of the SHAP FI ordered ranking, there are low SHAP FI values (which tend to be yellow), such as house size, decent drinking water, and internet access.   The features importance score of the machine learning algorithm results on SMOTE-N and ADASYN-N tends to have the same level of agreement. Therefore, the user can choose one of the more appropriate techniques according to the characteristics of the data. According to [24], SMOTE can overcome data with a scale/unit inconsistency in its predictor variables. However, this technique is not very effective on high-dimensional data. In addition, the resulting synthetic example does not consider neighbors of other classes, so class overlap increases. Meanwhile, the synthetic example in ADASYN considers neighbors from other classes using a weighting technique based on density distribution. The amount of ADASYN synthetic data is dependent on the density distribution of the data. According to [25], this technique emphasized a difficult sample set to compensate for slope distribution. According to [26], the class imbalance problem depends on complexity of the data (located of minority data), level of class imbalance, size of data and classifier involved. The dataset used in this study has the characteristics of a response variable consisting of two classes and twenty-four predictor variables on a nominal/ordinal scale.
Statistically, applying the SMOTE-N technique to the dataset resulted in a slightly higher level of agreement with the feature importance score compared to ADASYN-N. Therefore, the dataset in this study is more suitable for using SMOTE-N.  without handling data appears to be spread out, while the ranking of features on ADASYN-N tends to be more closely related to each feature. This is supported by the 0.714 ICC value, which is not much different from the without handling data and SMOTE-N ICC value, which indicates a low Based on several feature importance generated by the SMOTE-N and ADASYN-N, those with high ratings tend to have adjacent ratings. However, they tend to be different in the medium, and those at low levels tend to be more similar. The pattern generated in without handling data does not follow the pattern found in the SMOTE-N and ADASYN-N ratings. together. This is supported by the 0.921 ICC value, which tends to be high and indicates a better level of agreement in producing the feature importance.

Interpretation of Feature Importance from Random Forest Algorithm on SMOTE-N
Based on model performance evaluation, the RF algorithm on SMOTE-N data is the best model for identifying food insecurity cases. Therefore, the features importance is interpreted based on the SHAP Feature Importance generated from the algorithms and techniques for without classimbalance.
Globally, the level of feature importance is reflected in the SHAP FI value, which is the absolute average value of the SHAP Value contained in Figure 7 (Global Interpretation). It shows that decent sanitation ranks first, followed by house size, education of household head, number of family members having saving account, and floor types are the top five features important. The difference in SHAP FI values between the top ranks tends to be small. For example, drinking water is decent (0.098) with a house floor area (0.081) of 0.017. The difference between ranks 2 and 3 is also smaller, only 0.004. This difference indicates that the level of importance of the food insecurity feature can change positions if using different algorithms and class-imbalance handling techniques.
However, suppose the top features importance are analyzed (e.g., the top 10). In that case, the features that appear in the top group tend to be the same in various algorithms and data, including

Local Interpretation
Global Interpretation Figure 7. Score of feature importance as a result of the Random Forest modeling algorithm on data that is handled by handling class-imbalance problems with the SMOTE-N technique according to the type of interpretation Figure 7 (Local Interpretation) shows the interpretation locally. In the first rank, the low value of decent drinking water (in this case, the households using unsafe drinking water) reflected in the blue dots will be more conducive to predicting households with food insecurity status. On the other hand, as reflected in the red dots, high-value drinking water (households use proper drinking water) will encourage the prediction that households are not food insecurity. In the following few stages, it can be concluded that household conditions that are more conducive to predicting food insecurity are households with low floor area, low household head education, few savers, and low-quality floor types.