CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL TO SUPPORT THE FOOD SECURITY AND NUTRITION QUALITY M. FARIZ FADILLAH MARDIANTO, SULIYANTO, FARIED EFFENDY, ANTONIO NIKOLAS MANUEL BONAR SIMAMORA, AYUNING DWIS CAHYASARI, CHAEROBBY FAKHRI FAUZAAN PURWOKO, NETHA ALIFFIA

The Movement for Diverse, Nutritious, Balanced, and Safe Diet, in this article called by B2SA is a program from the Indonesian government to improve resilience and nutritional quality in line with one of the Sustainable Development Goals, especially during the Coronavirus Disease (COVID-19) pandemic. In this article, classification and grouping methods are carried out to determine the development of supporting the B2SA program in Indonesia, such as the classified menu arrangement and the potential for grouped foodstuffs, especially in East Java, which is one of the provinces with a high COVID-19 spread rate and contributes greatly to food security in Indonesia. The application of the classification method in this study is to compare the performance of logistic regression and random forest. In addition, the clustering method is applied by comparing the performance of Single Linkage and K-Means. The results of this study are the category of food menu recommended by the population of East Java, which turned out to be 49.3% not meeting the B2SA standard. As for the results of the grouping, there are four groups for potential food categories of staple foods and side dishes, two groups for the category of fruits and vegetables. These results are 2 MARDIANTO, SULIYANTO, EFFENDY, SIMAMORA, CAHYASARI, PURWOKO, ALIFFIA expected to be a recommendation for the government in supporting the stability of food security to strengthen the resilience of the food industry in Indonesia because it is a region that has food potential in Indonesia.


INTRODUCTION
The Coronavirus Disease-19  pandemic has significant impact on various sectors of life and disrupted the targets of most countries to achieve the Sustainable Development Goals (SDGs). Some of the SDGs goals that all countries including Indonesia want to decrease hunger, achieve food security, and ensure good welfare and prosperity. However, according to the Food Agriculture Organization (FAO) and the International Food Policy Research Institute (IFPRI), the COVID-19 pandemic can create a new food crisis that affects the food security of a country, especially the poor and developing countries [1]. One of the developing countries whose food security sector has been affected by the COVID-19 pandemic is Indonesia. In Indonesia, to improve food security during the COVID-19 pandemic, there is one government program, namely the Movement for a Diverse, Balanced, and Safe Diet, in this article called B2SA.
Indonesia with its geographical condition and abundant natural resource potential has its regional characteristics which are reviewed based on regions such as provinces. One of the provinces in Indonesia that supports national food security is East Java Province. East Java is an excellent producer of agribusiness products, several commodity products in the agriculture, plantation, and horticulture sectors, as well as animal husbandry [2].
B2SA is one of the implementations of food consumption in the family which is carried out through the selection of food ingredients and the preparation of menus. The B2SA uses a food menu structure for one meal or a day according to mealtimes in an amount that meets the rules of balanced nutrition. B2SA plays a role in maintaining body weight, increasing the body's defense against disease, as well as distributing body intake so that it can improve nutritional quality. The 3 CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL B2SA can be applied as a preventive measure to prevent the spread of the COVID-19 virus. This is following the movement launched by the government recently includes wearing masks, washing hands with soap, keeping a distance, avoiding crowds, delaying travel, and maintaining a healthy diet, namely the 6M movement in Indonesia. If the B2SA is applied, the potential for a person to be exposed to the virus is reduced [3].
In the study of Statistics, many classification methods have been developed so that they can be used to identify whether the food patterns of the Indonesian population are classified as B2SA.
The classification methods include logistic regression, which includes the classical classification method. Logistic regression is used because it does not require the assumption of multivariate normality and homogeneity of the variance-covariance matrix [4]. The modern classification method is also used in this study, namely random forest. The advantage of random forest is that it can handle large amounts of training data and can provide good classification results with low error [5]. Of course, the combination of food menus according to B2SA standards in East Java is a potential food whose production must be optimized.
Optimization of food potential in each region can be one way to achieve food security in Indonesia. The grouping of regions in East Java-based on their food potential is very necessary to determine the food potential is owned in each city district so that menu can be prepared according to B2SA standards in an application. Therefore, a study is conducted to classify the food potential of districts and cities in East Java to optimize production. In this study, the grouping of the East Java region is carried out using a cluster analysis approach with a hierarchical approach. For nonhierarchical approach using K-Means method. The K-Means method is a method of grouping data based on the cluster center point (centroid) closest to the data. The basis of the K-Means algorithm aims to minimize the cluster performance index, squared error, and criterion error [6]. Meanwhile, for the hierarchical approach using the Single Linkage method, the best grouping results are selected as indicated by the internal cluster dispersion rate (icd rate) so that policy recommendations formulated based on the results of mapping food potential in East Java can be accurate and on target [7]. In this study, the grouping of potential food is carried out in four categories, namely staple foods, side dishes, fruits, and vegetables.
There are several previous studies regarding the classification method as a reference in this study, including research which compares the performance of discriminant analysis with logistic regression in classifying consumer behavior deviations in using prepaid electricity pulses [8]. The results obtained from this study, namely logistic regression can classify consumer behavior deviations more accurate than discriminant analysis because the addition of consumer data does not affect the performance of logistic regression. Furthermore, there was research that compared the random forest method, support vector machine, and propagation neural network in detecting beverage brands [9]. The results of this study indicated that random forest can handle unbalanced data, multiclass, a small number of samples, and data without preprocessing procedures. Thus, it was decided to compare the performance of logistic regression with random forest in categorizing food menu combinations recommended by the people in East Java.
Previous research related to grouping includes research that grouped dairy products and nuts based on the nutritional content and characteristics of each food [10]. The results of this study indicate that there are five groupings of dairy products and nuts that can be used as recommendations for dietary intake to stabilize adequate nutritional values. According to Hartigan [11], hierarchical cluster analysis using the Single Linkage method provides consistent results for large clusters. In addition, Wilkin and Huang [12] stated in their previous research related to K-Means cluster that can cluster large data even though it contains outliers. Therefore, in this study, it is decided to compare the performance of the Single Linkage and K-Means cluster methods.
The novelty of this study is that there are two main objectives, which is analyzed by comparing the performance of the two types of methods. The main objective in this study is the classification that belongs to supervised learning and the grouping that belongs to unsupervised learning, which is very popular in the development of Statistics and Data Science lately. Classification is done by comparing the performance of the logistic regression analysis method and random forest to determine whether the food menu according to the B2SA standard based on the results of food production in East Java. Meanwhile, the grouping is carried out using the hierarchical cluster 5 CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL method through Single Linkage and non-hierarchical cluster analysis using K-Means to map the results of certain foodstuffs in districts and cities in East Java.
The urgency of this research is to classify and map areas in East Java based on food potential for preparing menus according to B2SA standard. This research is expected to help related parties in improving national food security with the B2SA movement and optimizing local food production that can meet the food needs of the community, especially in East Java during and after the COVID-19 pandemic to be end.

Binary Logistics Regression
Binary logistic regression is used to analyze the relationship between one response variable and several predictor variables, with the response variable in the form of dichotomous qualitative data, with a value of 1 to indicate the presence of a characteristic and a value of 0 to indicate the absence of a characteristic [13]. In summary, the stages of binary logistic regression analysis are as follows:

• Simultaneous Parameter Significance Test
This test is conducted to examine the role of predictor variables on response variables simultaneously or as a whole. The simultaneous test is also called the chi-square model test. The hypothesis of this test is as follows: The test criteria used of 0 is rejected if the p-value < α, thus it can be concluded that all predictor variables simultaneously have a significant effect on the model [14].
• Partial Parameter Significance Test Partial testing is carried out to determine whether there is a significant effect on each parameter on the response variable [14]. The partial parameter significance test is carried out using the Wald test with the following test hypotheses: If the p-value < , then it can be decided to reject 0 , thus it can be concluded that the predictor variable partially has a significant effect on the response variable.
• Model Fit Test Osborne [13] stated that the model can be said to be feasible or meet the goodness of fit if there is a match between the data entered in the model and the observed data. In binary logistic regression, model feasibility testing can be done using deviance test. If the pvalue > , then it can be decided to reject H 0 , so it can be concluded that the model is appropriate or there is no significant difference between the observations and the possible predictions of the model.

• Interpretation of Odds Ratio
The Odds Ratio (OR) value is used to interpret the parameter coefficients. Hosmer and Lemeshow [14] stated that OR is the average magnitude of the tendency of the response variable to have a certain value if x = 1 compared to x = 0.

Random Forest
The random forest method is a classification and regression-based method where there is a decision tree aggregation process. Breiman and Cutler [15] stated that the random forest algorithm can be briefly divided into the following stages: • Bootstrap Aggregating Stage This stage involves taking a random number of samples from the original data set with returns.
• Decision Tree Formation Stage At this stage, the tree is built based on each data bagging cluster until it reaches the maximum size where there are three procedures to obtain multiple decision trees [16]. 7

• Iteration Stage
This stage is the stage of repeating bootstrap aggregating stage to decision tree formation stage, then a forest consisting of n decision trees (ntrees) is obtained.

• Majority Voting Stage
This stage is the class prediction stage in the classification model based on the class that is most predicted by a set of trees.

Classification Accuracy Value (1-APER)
One of the procedures to determine the accuracy of classification is through APER [4].
The APER value represents the proportion of samples that are incorrectly classified by the classification function. Classification errors can be seen based on the confusion matrix that presented on Table 1 as follows:  (1)

B. Clustering Method
Multivariate analysis is a statistical method that aims to analyze data consisting of many variables. Multivariate analysis is widely used to overcome various problems such as data reduction, group formation, to hypothesis testing [6,17]. Cluster analysis is one part of multivariate analysis. Cluster analysis consists of hierarchical and non-hierarchical groupings [18,19]. The distance measure used is the Euclidean distance.
The results of cluster formation through a hierarchical approach can be displayed through a dendrogram, including in Single Linkage [18]. While the non-hierarchical approach is very 8 MARDIANTO, SULIYANTO, EFFENDY, SIMAMORA, CAHYASARI, PURWOKO, ALIFFIA appropriate when used on large multivariate data and the number of groups to be formed has been determined. One of the popular non-hierarchical grouping methods used in K-Means.

Single Linkage (Nearest Neighbor)
Single Linkage performs the formation of clusters by combining the smallest distances.

K-Means Method
The K-Means method is widely used in the study of big data and machine learning. The basic principle of the K-Means method is to partition clusters that separate data in separate areas [20]. The cluster formation algorithm using the K-Means method is as follows: • Optimal Cluster Number Selection Size The selection of the optimal number of clusters is done by looking at the Pseudo-F statistical value [7]. A high number of certain clusters indicates that the number of clusters has been optimal. If 2 is the coefficient of determination, is the number of samples, is many clusters, then statistics of Pseudo-F formulated as follows: • Criteria for Selection of the Best Cluster Method The selection of the best cluster method is done by using the icd rate criteria. The icd rate criterion is closely related to the coefficient of determination. The coefficient of determination is a measure that shows the amount of contribution given by each variable in the cluster that is formed [7]. The coefficient of determination for the group, variable, , and data in i-th group is defined as follows: The results of grouping through certain cluster methods will be better if the value of the icd rate is getting smaller [7]. Mathematically, the icd rate can be written as follows: • Principal Component Analysis Principal Component Analysis (PCA) is a factor analysis technique where several factors will be formed in the form of variables that cannot be determined before the analysis is carried out. PCA is used to reduce data while maintaining the meaning of the data [21]. In reducing variables, it is necessary to determine the strength of the contribution or feasibility of each variable formed through the value of the closeness of the relationship between variables or the correlation matrix. One way to check the correlation matrix is to use the Measure of Sampling Adequacy (MSA) test. A variable is said to have a strong contribution and is feasible for factor analysis if it has an MSA value of more than or equal to 0.5 [22].
Thus, these variables can be analyzed further by eliminating variables that have an MSA value of less than 0.5.

C. Nutritional Standards based on Calories
A person's nutritional status is determined by several factors including energy consumption, physical activity, genetic conditions, and infectious diseases where nutrient-rich foods cannot be met with one meal because nutrients from various foods must be able to meet the needs of energy, protein, calcium, iron, zinc, vitamins A, B, C, D, magnesium, phosphate, potassium, and folic acid [23]. During the COVID-19 outbreak, it is necessary to increase the body's immunity which can be obtained from the intake of nutrients and food security that must be consumed and the easiest to socialize, namely the B2SA food concept [24]. In this study, the nutritional standard used is calories. The reason for using calories as a nutritional standard is related to the Prevalence of Undernourishment (PoU) as one of the SDGs indicators to combat hunger [25]. In addition, the nutritional standards have been converted from the Guidelines for Balanced Nutrition by the  Table 2

A. Data
In this study, the data used to perform the classification analysis is a combination of respondent's recommended food menu spread across East Java during the COVID-19 pandemic.
The data was obtained from a survey using a questionnaire as a research instrument. The sampling technique used was quota sampling. There is a quorum limit, which is a maximum of 2 respondents per sub-district in East Java. From the survey results, 1,285 food menu recommendations were obtained by respondents spread across East Java. Then, the food menu is converted in caloric value as a nutritional standard based on the Indonesian Food Composition Table (TKPI) 2017. After the conversion, the calories in each variable are calculated so that the total calorie value can be obtained, and then it can be detected whether it meets the B2SA based on Guidelines for Balanced Nutrition by the Ministry of Health of the Republic of Indonesia in 2014. In addition, the data used to conduct cluster analysis is secondary data on food commodity yields in 29 districts and 9 cities in East Java Province taken from East Java in Figures 2020 [2].

B. Research Variable
The variables used in this study include variables for the classification method and variables for the cluster method.  The research variables used to classify the food menu combinations of the East Java population consist of the dependent variable and the independent variable which shown in Table 3. In addition, the variables that will be grouped using the cluster method are food commodities in the East Java

C. Analysis Procedure
The steps of data analysis of this study are as follows: 1. Perform descriptive analysis to determine the description of the data used.

Perform classification analysis
a. Conducting data preprocessing which includes data cleaning from outliers, feature or variable selection, and data labeling.
b. Conducting a classification analysis using the binary logistic regression method with the following steps: i. Testing the significance of the parameters simultaneously and partially.
ii. Conducting deviance test to test the suitability of the model.
iii. Interpreting the OR value. iv.
Calculating the value of classification accuracy (1 -APER).
c. Conducting a classification analysis using the random forest method with the following steps: ii. Determine m predictor variables (mtry) which will be taken at random to be used in the classification tree sorting process.
iii. Taking samples with bootstrap aggregating technique to obtain a new dataset.
iv. Forming a decision tree based on the CART algorithm without pruning from the dataset, where for each node selects the best sorter is selected from m predictor variables taken at random. b. Grouping districts and cities in East Java-based on food potential as follows: i.
Conducting grouping with a hierarchical approach procedure including Single Linkage as well as with a non-hierarchical approach in the form of K-Means.
ii. Selecting the best number of clusters for each method using Pseudo-F statistics.
iii. Selecting the best cluster method by using the size of the icd rate and the coefficient of determination.
c. Displaying the best cluster results in the form of a map and analyzes the characteristics of each cluster that is formed.
4. Determine conclusion based on classification and clustering method.

A. Descriptive Statistics
The description of the research data is carried out descriptively on the survey data to determine the characteristics of the community recommendation menu in East Java according to B2SA standard. The frequency of food menu combinations recommended by the people in East Java is shown in Table 5.   Table 5 shows that the percentage of combinations of people's favorite food menus in East Java that are classified as non-B2SA is 49.3%. This means that there are still many people in East Java whose favorite food menu is not following B2SA standard.

Staple food
Another fact shows that East Java Province is one of the largest producers of agricultural, 15 CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL plantation, livestock, and fishery commodities in Indonesia, especially for food crops. The summary of the highest production of each type of local food commodity in East Java in each district and city according to the B2SA standard which includes staple foods, side dishes, fruits, and vegetables, respectively, is shown by Table 6, Table 7, Table 8, and Table 9.

No
Food commodity Maximum value Location   Based on Table 6 and Table 7, the highest production yields for types of staple food commodities and side dishes tend to spread in several districts or cities in East Java. However, in the Table 8 and Table 9 related to the highest production yields for the types of fruit and vegetable commodities, the majority are in Pasuruan and Malang Regencies.

• Simultaneous Parameter Significance Test
The parameter significance test is simultaneously carried out to determine whether the predictor variables had a significant effect on the model. The following are the results of the simultaneous parameter significance test for the factors that are thought to categorize the combination of food menu recommended by the community in East Java.  Based on Table 12 shows that the value of 2 = 361.88 is less than • Odds Ratio (OR) OR is the value of the tendency between one category and another on the qualitative explanatory variable. The trend ratio value can be seen in Table 13 as follows:   Obtaining ntree and mtry values in random forest is based on the smallest OOB error value. Fig. 1 shows a graph of the OOB error value considering 300 trees and Table 15 shows the OOB error value of the three possible mtry values.  The blue line in Fig. 1 shows that the OOB error value begins to converge when using 145 trees so that the ntree value of 145 can then be used. While Table 15 shows that the smallest OOB error value is obtained when mtry = 2. Thus, it can be concluded that the modeling of the classification of food menu combinations can be built based on the random forest model with an ntree value of 145 and an mtry of 2. Fig. 2 shows one of the tree forms from the 145 trees that have been built by the random forest model with mtry = 2.   Fig. 3, where another food variable becomes the root node in the tree. Another food variable is a variable that contains various types of snacks consumed by people in East Java. The types of snacks consumed are very diverse, from high nutrition to low nutrition. Indonesia is a country with a high number of snack consumers, and only 2% of them choose healthy snacks [26]. In fact, the type of snack consumed has a significant effect on nutritional status in the body [23].
On the other hand, based on Fig. 3, vegetables and fruit variables have a more important role than side dishes in classifying food menu combinations. Many studies show that adequate intake of vegetables and fruit is the most important part of a healthy life [27].
Vegetables and fruit can also improve the nutritional status of the body and reduce the risk of chronic diseases [28].

• Model Evaluation
The analysis step then proceeds to the evaluation of the model. The evaluation of the model in the random forest model is carried out based on the confusion matrix on Table 16.    Table 17, it can be seen that the classification results using random forest have a classification accuracy value that is closest to 100%. Thus, it can be concluded that among the three classification methods, random forest provides the most accurate results in categorizing food menu combinations in East Java according to B2SA standard.

Data Reduction with Principal Component Analysis (PCA)
The overall variables which are food commodities consist of 68 variables with details of 3 main food commodities, 16 side dishes consisting of 5 livestock products and 11 marine and aquaculture commodities, 27 fruit commodities, and 22 vegetable commodities. However, data with high dimensions can result in decreased classification accuracy and cluster quality [29].
Thus, it is necessary to reduce data through PCA which involves a correlation matrix between 23 CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL variables through the magnitude of Measures of Sampling Adequacy (MSA). In this case, PCA was performed on the category of fruits and vegetables that had the most variables. The MSA results for the categories of fruits and vegetables are shown in Table 18 and Table 19

Grouping of Regencies and Cities in East Java based on Local Food Commodities
Before grouping districts and cities in East Java based on their local food commodities, it is necessary to do a calculation to determine the optimal number of clusters for each type of food.
The optimal cluster in this study was selected based on the largest Pseudo-F statistic value from each method used, namely Single Linkage and K-Means. Then, from the two cluster methods, one of the best cluster methods was selected based on the smallest icd rate value for further interpretation. Calculation of the optimal number of clusters of staple foods, side dishes, vegetables, and fruits, respectively, is shown in Table 20.  Table 20, the optimal number of clusters for each type of commodity is shown in  Table 21, a dendogram is generated for the hierarchical method of each optimal cluster method which shows the grouping of foodstuffs based on districts and cities in East Java. The optimal cluster dendogram of staple foods, side dishes, and fruits is shown in Fig. 4. potential. The green color shows the grouping of region IV based on the category of food potential.

Characteristics of Clusters of Local Food Commodities in East Java
Based on the cluster analysis in the previous section, it was found that the level of food potential in each district and city is very diverse, which is indicated by the final cluster means value. One of the efforts to realize food security in Indonesia can be done by optimizing the potential of each region, especially in East Java. By identifying the characteristics of each cluster of food potential levels, food security can be carried out effectively and on target. The characteristics of each cluster for each food category are presented in Table 22, Table 23, Table 24,   and Table 25.

CONCLUSION
During the COVID-19 pandemic, it can be seen that there are still food menus recommended by East Java residents that do not meet B2SA standard. In order to maximize the B2SA movement in Indonesia, especially during the COVID-19 pandemic, it is necessary to have a massive B2SA campaign by providing information in the form of food menu categories according to B2SA standard which is the result of the performance of the best classification method in this study, namely random forest that is included in supervised learning. Of course, the food 29 CLASSIFICATION OF FOOD MENU AND GROUPING OF FOOD POTENTIAL menu consumed by the community is a food potential that must be maximized for production in each region. In this study, it is found that the Single Linkage hierarchical cluster method shows maximum performance in grouping side dishes and fruits based on the regions in East Java, while the K-Means non-hierarchical cluster method gives the best performance in classifying staple foods and vegetables based on the regions in East Java. Therefore, to maximize the production of each type of potential food in each region that has been grouped based on the results of the performance of the two cluster methods which include unsupervised learning. In addition, synergy is needed between the community and the Indonesian government to maximize the B2SA movement in the COVID-19 pandemic so that the SDGs targets can be achieved.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.