HYPERPARAMETERS AND CENTROID IMPROVEMENTS IN THE K-MEDOIDS METHOD FOR GROUPING PROCESSED BEEF SMEs

,


INTRODUCTION
Fundamental changes in the processed beef production and trade system have implications for regions and industrial clusters [1].The livestock sector in the Madura region, mainly processed food products, is an essential factor in supporting the community's economy.The solution offered to solve the problem of determining the quality of processed food products, especially yeast jerky, is to group them into produce groups that will be validated by nutrition and food experts [2].
Determining the quality of processed beef depends on the attributes used in mapping the quality of processed food from Madurese beef yeast meat, namely body condition consisting of age (years), meat quality, type of cow, feeding pattern, and features of meat processing techniques based on drying time, and duration of storage [3].
Non-hierarchy-based clustering varies Mechine Learning (ML) greatly depending on the centroid update process [4].Clustering algorithms for applications ML of various types of data, both numeric and categorical [5] [6], with partition grouping being further divided into centerbased algorithms [7] with means [8], different harmonic methods [9], medoids algorithm [10], and spectral clustering algorithm [11].K-Means is a non-hierarchical technique for grouping data that looks for to separate information into multiple groups, or clusters, with the goal of grouping data that share similar attributes into a single cluster.Data with distinct features, on the other hand, are categorized into different groups [12].Medoid objects are used as cluster centers in the k-medoids method, a partitioning technique used in cluster analysis [13].The medoids object is chosen two as the middle point of the data group in the cluster [14].The data that has been obtained will be analyzed using the k-medoids clustering algorithm to group it based on the required guidance.The k-medoids method uses actual objects from the data set as cluster centers, while the k-means method uses the average for resulting cluster representation [15].In addition, the k-means method often faces convergence problems when the data is extensive.To determine the most accurate way, we will use the Sum of Squared Errors method as the most accurate method.The sum of Squared Errors (SSE) is a statistical method used to measure the total difference between the actual and obtained values.This method is useful in analyzing the accuracy or error of predictions in a model or algorithm.The squaring difference between the real value and the predicted value, SSE shows how much the model or algorithm accurately describes or indicates the data.The smaller the SSE value, the better the model or algorithm matches the data to the actual value [16] [17].
In this research, K-medoids were chosen because this algorithm effectively clusters data that contains outliers, considering that the existing data has values significantly different from the average of other data.[18].K-Medoids is a variant of k-means that is sensitive to outliers.In the k-means method, objects with extreme values can significantly influence the data distribution.However, K-Medoids have reduced sensitivity to outliers by not relying on centroids to represent cluster centers.Apart from that, the silhouette coefficient is used to determine the optimal number of k.The Silhouette coefficient value ranges from -1 to 1 and indicates the extent to which data grouped in a cluster is similar.If the average value of the Silhouette coefficient is close to 1, it can be considered that the cluster has good quality.On the other hand, if the average value of the Silhouette coefficient is relative to -1, it can be concluded that the cluster has poor quality [19].
The number of clusters in a data set must be determined a priori, and the initial cluster center m (medoids) must be selected randomly.They affect algorithm performance, mainly if applied to large data sets.Determining the optimal number of clusters to start clustering is complicated because random selection of initial cluster centers (centroids) sometimes results in minimal local convergence [20].Grid search (GS) hyperparameters are needed for selecting k and the best number of medoids based on iteration stops.The usage for determining the cluster (centroid) contains different initial binary searches to get optimal cluster results [21] [22].
The object matches its cluster poorly and neighboring clusters poorly when the value is high.A grouping configuration is appropriate if the majority of the objects have elevated values.The clustering configuration may have too many clusters if a large number of the points have low or negative values [23].The Silhouette coefficient considers the proximity of the data to the cluster in which it resides and the separation of the data from other clusters as a more holistic measure of clustering quality.The K-Medoids algorithm collects data based on similarity so that data with the same characteristics will be put into the same cluster.The similarity of the data can be measured by how close the distance between the data is and the distance between the data and the data centroid [24].The working principle of the K-Medoids algorithm is to determine the number of clusters first and then determine the centroid point of the data randomly [25].After that, allocate the data to the nearest clusters, and the process will be repeated until it finds a stable centroid.The output of the K-Medoids algorithm is highly dependent on determining the number of clusters and centroid selection, which is selected randomly and repeatedly.The problem with the K-Medoids algorithm is that it produces a final centroid that is not the true cluster center.This algorithm must be run many times with different initial centroids to get the final centroid that is considered the best [26][27] [28].
The dynamic cluster algorithm provides better and more accurate potential segmentation results within the K-means algorithm and calculates the number of clusters (k) to produce optimal cluster quality [29].Though it still requires a random selection for the centroid point in the clustering process, this algorithm shares some drawbacks with the K-means algorithm.Identifying the centroid point is finished by transforming K-means into K-means Binary Search Centroid (KBSC), which employs the Binary Search technique approach to determine the centroid point [30] [32].The study's findings demonstrate that the K-means Binary Search Centroid (KBSC) algorithm outperforms the K-means algorithm regarding intra-and inter-cluster values.Nevertheless, there are restrictions on how many clusters the K-means Binary Search Centroid (KBSC) algorithm can form.According to an explanation, there are advantages in identifying the starting cluster center and drawbacks in determining the number of clusters for both the Dynamic K-means and KBSC algorithms.Other than that, the Dynamic K-means algorithm has advantages when figuring out the number of clusters and disadvantages when figuring out cluster centers [32][33] [34].
The research proposed that determining the hyperparameter k value offered to combine the K-Medoids Algorithm with the Binary Search Centroid (KMBSC) can complete the clustering HYPERPARAMETERS AND CENTROID IMPROVEMENTS process to determine the number of clusters to be formed.The clustering process in terms of determining the number of clusters, determining the number of clusters, and determining cluster centroid point to produce the SSE value of the index value for the Beef Processing MSME clustering case study.The k-Medoids algorithm with the Binary Search Centroid (KMBSC) algorithm has limitations in determining the number of clusters to be formed based on the centroid having better intra-and inter-cluster values than random centroid update.Therefore, it is proposed to combine the K-medoids algorithm by optimizing the number of m pada k-medoids and clusters to complement the clustering process in assuming the number of clusters and determining points cluster centroid.So, using measurement with the Davies-Bouldin Index value is the best case study for clustering the quality of processed meat in SMEs [35][36].

Dataset Description
Data is taken from government agencies and the beef processing SME community, consisting of annual turnover, condition of cattle, meat quality, price, etc.The sustainability of the production process is more guaranteed.The asset value SME are also described using nominal rupiah, then simplified using numbers and grouping using k-medoids.The next stage is processing operational data to extract information, converting categorical data to numeric, and changing the data scale to a specific range of values using the scale encoding.

Preprocessing Data
Missing Completely at Random (MCAR) refers to the random occurrence of missing data, where the distribution of missing data on a feature is independent of the observed or missing data.This method generates missing data randomly based on a predetermined proportion while using the entire dataset.This approach has the benefit of facilitating researchers' estimation of the computational performance of the suggested model.Another mechanism is Missing at Random (MAR), in which the observed data is independent of the missing data.Still, the distribution of missing data on a feature depends on the observed data.Finally, Not Missing at Random (NMAR), in which the missing data determines how the missing data on a feature is distributed [28].The most typical common and simple method to replace missing is mean imputation [29].
With Ximp is observations x1, x2, ..., xi from the dataset without missing values, N is the total number of observations that do not include missing values.This technique is straightforward and effective when data is Missing Completely at Random (MCAR) [30].
Normalization is a process to change values so that all values in the data have uniform values with the same range.This normalized data will later become input to the clustering process [5].
The min-max normalization process can be calculated as follow: The data normalization process is obtained using the min-max normalization method, for each value on an attribute is reduced by the minimum value on that attribute, then divided by the range value.

K-Medoids Clustering
In general, there are two clustering approaches methods, namely, the partition approach and the hierarchical approach.Clustering with a partition approach is a grouping of data from one large group and then divided into several smaller groups [8].An example of a clustering method with a HYPERPARAMETERS AND CENTROID IMPROVEMENTS partition approach is K-Means Clustering.Clustering with a hierarchical system, often called Hierarchical Clustering groups data by combining each record or individual in the data into clusters.
An example of a clustering method with a hierarchical approach is Agglomerative Hierarchical Clustering [24].The k-medoids method divides data consisting of n objects into k clusters, where the number of k is not greater than n.Medoids are used as cluster representations and function as cluster centers.
The process of forming clusters in the k-medoids algorithm is carried out by calculating the similarity distance between medoids and non-medoid objects.This analysis aims to minimize the dissimilarity of each object in the cluster by using the absolute error value (E) [17].
with   = The number of objects in the c-th cluster,   = Non-medoid object i in the c-th cluster and   = Value of medoids in the c-th cluster.So, randomly select an object as a point that is not a medoid.Next, calculate the distance of the object in each cluster to the non-medoid candidates, to produce variants.The smallest variance value (S), if the new TD <old TD, swap the position of the new medoid, then it becomes a new medoid.So that the final results of the medoid do not change [27] [28].

Number of variations of cluster cn
A dataset MSE n feature Output: the best cluster (c) the best stop iteration The minimized dissimilarities of each object Algorithm: 1. Initialize the similarity calculation process for all data in setting c according to n random data variant points from space D.
2. Determine each calculated data point based on its closest medoid (m).

Termination:
If it matches the best model, then the k value is concluded, and if you still need to find a model, go to step 1.So, the medoid obtained the model with the lowest error.

Grid Search
Grid Search (GS) is a simple and capable search method in a high-dimensional hyperparameter configuration space, as the number of judgments increases exponentially as the hyperparameter search frequency increases.Hyperparameters are obtained by assuming that k parameters exist and each has n separate values.The computational complexity increases exponentially at a rate of O(nk) [20] [21].Thus, GS can be an efficient HPO approach as a thorough exploration or brute force method that tests all combinations of hyperparameters given a grid configuration.9 HYPERPARAMETERS AND CENTROID IMPROVEMENTS GS operates by assessing the cartesian product of a finite set of values the user specifies [22].GS alone will not further exploit areas that perform well.Therefore, the following process must be performed manually to identify the global optimum point.The GS workflow is presented in the steps follows: 5. Update.Update data D at n+1. 6. Repeat step 2; step 5 until sampling is required according to the number of points n, next until the lowest cost and repeat again.

Binary Search Algorithm
Binary search is a method of searching for data in an ordered array.This method is more efficient than the linear search method, where all elements in the array are tested one by one until the desired part is found.Apart from binary search, there are also interpolation search and jump search, which both work on sorted data.The searching on sorted data results in a fast search, with interpolation search having an average time complexity of O(log log n), while jump search is O(kn1).jump search is O(kn1/(k+1)).The time complexity for binary search is O(log n), as proposed by Knuth.[29].The division process will continue until the data is found [30].The principle of binary search can be explained as follows: Suppose the left index is i and the right index is j. 5. Repeat the first step until x is found or i > j, i.e., the array size is zero.

Proposed Algorithm
K-medoid clustering is a method in unsupervised learning, the same as k-means clustering.So, the clustering process is set to find cluster center points (centroids) that minimize the distance between members in the cluster and the center point.In a population set x, several data {x1, x2, x3, …, xn}.Furthermore, the data will be grouped into clusters with the number of clusters being c, in this case c ≤ n.In K-medoid clustering, set members are grouped based on their proximity to each other so that the average distance of members in the cluster is minimal.In K-medoid clustering, the medoid concept is known.Medoid is a cluster member, which is the central point of the cluster.The number of medoids in the population is equal to k.Thus, the set M can be symbolized as {m 1 , m 2 , m 3 , …, m n }.This algorithm aims to minimize the number of similarities between each object and its corresponding reference point.We have combined several algorithms, namely the K-medoids algorithm with the binary search algorithm, to improve centroid updates and hyperparameter grid search for parameter search, which aims to eliminate the computational burden.The proposed hybrid algorithm is explained as follows: 1. Initialize parameters: c (cluster), Initialize max value.

Evaluation Measures
The most accurate and appropriate algorithm for assessing algorithm performance is found through experimental scenarios in the model's evaluation.Following the formation of the cluster results, the algorithms are compared, and conclusions are made regarding which algorithm performs best, which has the best algorithm error, and what the ideal number of clusters is based on test data criteria.The sum of Square Within a Cluster (SSW) is a formula used to measure cohesion within an i cluster.The procedure is stated as follows [13].
The sum of Square Between Clusters (SSB) is a formula used to measure separation between clusters, the procedure is as follows: After getting the cohesion and separation values, the ratio (Rij) is measured to compare the ith cluster with the jth cluster.A good cluster is a cluster that has the smallest possible cohesion value and the most significant possible separation value.The formula for calculating the ratio (Rij) is as follows: We can calculate the Davies-Bouldin Index (DBI) [9] with this ratio value using the following formula: The k value is the number of clusters used in the analysis.The smaller the DBI value obtained (non-negative and >= 0), the better the quality of the clusters resulting from the K-Medoids grouping used [12].Measuring errors in each cluster using SSE (Sum of Squared Errors) is a measure used to measure how far data points in a cluster are from the cluster center or centroid in clustering analysis.SSE calculates the sum of the squares of the distance between each data point and its cluster center.Deviation measures the extent to which each data in a group or cluster differs from the cluster center, and the deviation metric is used [13].
With   = Value of the -th data feature,   = Feature or attribute of the ith cluster center point.
The smaller the SSE value, the better the clustering quality because it shows that the data points in the cluster are closer to the cluster center.Therefore, SSE is used as one of the criteria for evaluating the grouping quality in clustering.

MAIN RESULTS
We implemented the algorithm with the Python language jupyter notebook program.We ran it on an Intel core i7 machine with g10700K 3.8Ghz Up To 5.1Ghz 16MB Cache.The measurement comparison results of convergence time, size, and fitness values are obtained from GS-K-Means [21], GS-K-Medoids [22], GS-KMBS, proposed method GS-KMBSC are shown in Tables 2, and Table 3.The research results show that the k-medoids hyperparameter algorithm with Binary Search Centroid has the advantage of constant iteration in each test, with a DBI value close to 1.This is different from the test results of the traditional k-means algorithm where the DBI value is higher because it depends on the initialization of the center point value The initial clusters are random so that they obtain different validation values.Table 3. shows the optimal cluster results with a DBI value = 0.1021 during the 3-th cluster, with the best epoch each learning.Grid Search experiments to find parameters that are close to optimal in combinations within a given range.The process has been time-consuming if the dimensional data set is relatively high or the number of parameter combinations is enormous.Therefore, even so GS provides excellent results in almost any data set but is only reliable in low-dimensional data sets with few parameters.Each algorithm is implemented and executed in 5 different runs, each with a specific max iteration limit using the parameters.

DISCUSSION
The results of research testing to determine the performance of fundamental differences between k-means and k-medoids using SSE.Hyperparameter pada K-Medoids algorithm yields better results in average cluster quality compared to traditional K-Mean in SSE.The development of the SSE value is in line with the increase in the number of clusters used in the experiment.The range of clusters explored starts from 2 to 10, based on SSE show graph that the lowest SSE, that found when using 10 clusters, with an SSE value of 32.0970.The measurement GS-KMBSC with cluster 3-th will analyze the importance of attributes using standard deviation and mean in Table 4 shows SD measures the extent to which data is spread around its average value.The higher the standard deviation, the greater the variation or heterogeneity of the data.Conversely, a low standard deviation indicates that the data tends to be closer or more homogeneous around its mean value.
Performance measurement based on importance feature representation uses chi-square to calculate relevant feature selection to measure the level of importance of features for building the model.In cluster 3, the chi-square test tests the relationship between features for the best number of k clusters and N data points.For example,  is the ith cluster where  is the ith point in the ith cluster, wich shown the most dominant attribute in storage time.The grouping pattern for each cluster was obtained using the maximum standard deviation value of 0.5276 and the mean 0.3293 for one attribute type of cattle in the dominant cluster.The higher the standard deviation value, the more variation value.Meanwhile, the average value shows that the data is close to stationary, has similarities, and is a relatively stagnant value.Furthermore, the value based on percentage with chi-square shows the most dominant attribute of 5.4930% for storage time, indicating the attribute that has the most influence on the development of beef processing SMEs.

CONCLUSIONS
The K-Medoids hyperparameter grouping method with centroid refinement with binary search can be used to group data that do not have labels or previous class information.The research on beef processing data without labels previously obtained 3 clusters in DBI measurements.The clustering process involves some stages, including data preprocessing, using the label encoding method to convert categorical data into numerical data, data imputation using the mean method to fill in empty values, and finally, data normalization using the min-max method to ensure the data has a uniform scale.In the grouping process using K-Medoids, evaluation is carried out using DBI and SSE to measure the quality of both.Cluster analysis GS-K-Medoids measurements with the lowest DBI in cluster 3 were worth 0.1021, in the experimental range with the number of clusters from 2 to 10. So, the found that the lowest SSE results occurred in cluster 10, with an SSE value of 32.0970 for the K-Medoids method, lower than Traditional K-Means is worth 50.9282.Thus, the analysis of each cluster in both methods shows that the K-Medoids method is more optimal because it fits categorical data.Meanwhile, the analysis of attribute importance at c=3 shows that the top order of attributes shows that the best attributes occur in the criteria of storage time and the second in the price of meat.Each grouping pattern Clusters were obtained based on determining the quality of beef processing.Dominant clusters were found in local cattle because they have dense fiber.Based on the weaknesses of the clustering method, it is very dependent on centroid selection and missing data because it affects clustering performance.So, computational methods for selecting clustering hyperparameters and correcting missing value data are highly recommended.

3 . 5 .
Calculate the medoid on each data variant for the next iteration 4. Updating iteration: a. Randomly select another non-medoid object for comparison with the next iteration b.Swap the medoid (m ) with the data point (or) calculate the total cost (tc) every total data.Select the medoid with the lowest cost in the form of the lowest error measurement.

Figure 2 . 2 . 3 . 4 .
Figure 2. GS Process Ilustration [22]. 1. Initialization.Determine and evaluate some serval points of the configuration space for each parameter.2. Adjusting data.Using the value of each point based on a probabilistic distribution evaluates the performance of the most optimal method.3. Earn points on each possible next point.Obtain the next promising issue xi + 1 through optimization of the acquisition function on the Algorithm performance.4. Evaluate.Evaluate the selected point by calculating the objective function to get yi + 1.

1 .
Initially i with one and j with n. with n.Divide the two array elements on the center element.The center element is the element with index k = (i + j) div 2. (The center element, L[K],divides the array into two parts, namely the left part L[i..j] and the right part L[k+1..j])2.Check if L[K] = x, if L[K] =x then the search is complete because x has been found.But if L[K] ≠ x, it must be determined whether the search will be done in the left array or the suitable array.3. Done in the left array or the exemplary array.If L[K] < x, the search is done again on the left array.4. Conversely, if L[K] > x, then the investigation is conducted again on the suitable array.

Figure 3 .
Figure 3.The proposed methodology's framework in graphical representation.

2 . 3 . 4 ) 4 . 5 ) 5 . 6 ) 6 . 7 . 8 .
Contructing a grid search: determine the number c of clusters =[10,20,30,40] and Centroid range =[0.01, 0.1, 1.0].Initialize the lowest feature value and specific M distance: binary search stage with the first calculation of the distance between centroid points. = max(  )−min (  )  (Evaluate the centroid: sorting the value of each feature, from largest to smallest in each data.Evaluating closest all data/objects closest to the centroid most relative to the data/object.  = min(  ) + ( − 1) (Specify Euclidian Distance: the distance to the center point with smallest s (centroid) on each data record. = √( 1 −  1 ) 2 + (  −   ) 2 (Updating Solutions: Update the centroid every n iterations in order of centroids from smallest to next.Acceptability of the solution: determining the smallest centroid distance.Otherwise, maintain the current solution.Termination of Algorithm: Algorithm Termination: Stops the algorithm based on specified criteria if the performance measurements have been met, resulting in the best solution.Otherwise, then go back to step 2.

Figure 4 .
Figure 4.The measurement of computing time (s) testing multiple methods.

Figure 4 .
Figure 4. K-Means takes an average of 56 seconds for 3 clusters, while data processing in K-Medoids takes 1 minute 38 seconds.So, the higher the iteration and grouping specified dependence, the longer the data processing.Adding the centroid refinement process with binary search reduces iteration because the lowest centroid, stored in the local search, is selected without continuously repeating random values.

Table 1 .
Criteria for beef processing in SMEs

Table 2 .
Results obtained for DBI measure

Table 3 .
Grid Search Experiment results on several clustering methods

Tabel 4 .
Performance measurement of feature importance