MAPPING INDONESIAN POTENTIAL FISHING ZONE USING HIERARCHICAL AND NON-HIERARCHICAL CLUSTERING

: Indonesia, a maritime nation whose ocean area exceeds its land area, has an abundance of ocean-based natural resources, such as fish, seaweed, coral reefs, and other marine organisms. The fisheries industry is one of the potential sources of extraordinary marine resources for the Indonesian economy. The annual increase or decrease in fish production in Indonesia can be attributed to several factors, including natural influences such as climate and ocean waves, inadequate management of marine resources, unequal distribution of facilities to support increased fish production in Indonesia, and the characteristics of areas that have a significant impact on the resulting fish production. Consequently, the objective of this research is to classify provinces in Indonesia using clustering analysis so that government policy programs can be more focused and directed according to the characteristics of the clusters formed. The application of cluster analysis was based on the development of fish production data for each province in Indonesia from 2017 to 2019 obtained from the website of the Central Statistics Agency (BPS). Clustering analysis using hierarchical and non-hierarchical methods produces a dendrogram using the average linkage DTW hierarchical method, indicating the formation of two optimal clusters. Non-hierarchical clustering with two clusters produces the


INTRODUCTION
Indonesia, which is a maritime country with an ocean area that exceeds the land area, certainly has a lot of natural potential that comes from the oceans such as fish, seaweed, coral reefs, and other marine biota. One of the potential sources of extraordinary marine resources to support the Indonesian economy is in the fisheries sector. This is because Indonesia is the largest archipelagic country in the world, with 17,508 islands and 81,000 km of coastline, and around 70% of its territory is sea. With marine waters covering a total area of 5.8 million km 2 , Indonesia has abundant potential for biological and non-biological resources [1].

Figure 1. Territory of the Republic of Indonesia
Source: Indonesia.go.id Fisheries are all activities related to the management and utilization of fish resources and their environment, starting from pre-production, production, and processing to marketing, which are carried out in a fishery business system [2]. Fish production in Indonesia can increase or decrease 3 MAPPING INDONESIAN POTENTIAL FISHING ZONE every year. This can be influenced by several factors, namely natural influences such as climate, sea waves, inadequate management of marine resources, and the lack of an equal distribution of facilities to support increased fish production in Indonesia [3]. Figure 2 shows a graph showing the results of capture fish production in Indonesia from year to year.

Figure 2. Graph of Captured Fish Production in Indonesia Source: Statistics of the Ministry of Maritime Affairs and Fisheries of the Republic of Indonesia
Seeing how important the development of marine aquaculture in Indonesia is, data management is also very necessary. Therefore, against the background of these problems, the author will conduct a cluster analysis by clustering provinces in Indonesia that have the same characteristics based on the amount of fish production to determine the provinces with the best marine aquaculture opportunities, data management is based on annual production data (tons) in each province in Indonesia in 2017-2019. Data mining is a method used in large-scale information processing. Therefore, information mining has a very significant role in several areas of life, including industry, finance, weather, science, and technology [4].
Cluster analysis is a multiple variable analysis technique used to identify and classify individuals or objects that have similarities into certain groups or clusters. The characteristics of objects in a certain group will have a high level of similarity, while the characteristics of objects in one group to another group will have a low level of similarity [5]. The K-Means algorithm will test each component in the data population and mark this component as one of the predetermined cluster centers, depending on the minimum distance between the components and each cluster center [6]. In addition, the location of the cluster center will be recalculated until all data components are classified into each cluster, and a new cluster will be formed at the end which can then be used as a 2-dimensional or 3-dimensional mapping [7] [8].
Mapping provinces in Indonesia based on the amount of fish production is important to assist related parties, namely the government, in making appropriate plans and policies. Besides that, the results of this mapping help business people see a map of the potential of the marine fisheries sector. This mapping effort is expected to be able to combine provinces that have similar indicators.
Thus, mapping the fisheries sector will help related parties make policies that are more focused because handling will be adjusted to the characteristics of each province and help make the right strategy for the development of processed marine fishery products and create a good ecosystem for population growth. fish production in every province in Indonesia, and can meet the needs of facilities that can support increased fish production in every province in Indonesia.
The objective of this study is to cluster fish production results (tons) per year in all provinces in Indonesia for the 2017-2019 period using the K-Means algorithm so that groups that are based on the best marine aquaculture opportunities are formed in terms of the amount of fish production by year. Another objective of this clustering is to find out which clusters contain provinces with the highest fish production. The benefits of this research include providing information for fisheries management in every province in Indonesia, as reference material and guidance for planning fishing operations for the fisheries sector, and also meeting the needs of facilities in the fisheries sector to increase the amount of fish production in the following year. This benefit is especially important for the Indonesian government and input material for universities throughout Indonesia, both public and private universities that have Fisheries study programs regarding clustering based on the amount of fish production in each province in Indonesia using the K-Means algorithm. MAPPING INDONESIAN POTENTIAL FISHING ZONE

DATA COLLECTION
The data used in this study is secondary data. Secondary data is data reported by an agency, that the agency does not collect but obtains from other parties [9]. Secondary data in this study was used from the Central Statistics Agency (BPS) website, namely capture fisheries production data at sea by main commodity (tons) with observation units in 34 provinces in Indonesia. The period for fish production was taken from 2017-2019 [10].

Data on Sea Capture Fisheries Production according to main commodities (tons) in Indonesia
in 2017-2019 will be analyzed using the Non-Hierarchical and Hierarchical Clustering methods with the help of Software R. The package libraries used are "gridExtra", "factoextra", and "dendextend" [11].
The Non-Hierarchical Clustering method is used to determine the number of clusters to be formed. After that, the clustering process is carried out without regard to hierarchy or also known as the K-Means Cluster. Meanwhile, the Hierarchy method is carried out by first clustering two or more objects that have the closest similarity [12]. There is a clear level (hierarchy) between objects, from the most similar to the least similar. From the results of this clustering, it can only be known how many clusters are formed. The hierarchical graph that is formed is also called a dendrogram.
The formation of the number of clusters can be seen only through the dendrogram, according to the subjectivity of the researcher [13]. The steps of data analysis carried out in this study are as follows:

Do data exploration.
Data exploration, commonly known as data exploratory analysis, is a process of analysis and initial understanding of the data to identify patterns and relationships that may occur in the data.
The main goal of data exploration is to gain a better understanding of the existing data before carrying out further analysis and building predictive models [14].

Clustering the time series group analysis method using Dynamic Time Warping (DTW)
Dynamic time warping distance (DTW) is an algorithm that compares two data series and performs calculations to find the optimum path between the two data series [15] [16]. The distance DTW is a generalization of the classical algorithm that compares a sequence of discrete values with a sequence of continuous values. If there are two time series, namely Q and R, where = 1 , 2 , . . . , , . . . , and = 1 , 2 , . . . , , . . . , so the use of the DTW distance aligns the data in the two-time series so that the difference in the distance between the two is minimal [17]. An illustration of the application of the DTW distance is shown in Figure 3. The application of the DTW distance measure uses computation through an × matrix where elements (i,j) from the matrix there is a distance ( , ) . Euclidean distance is used to calculate warping paths with is a set of matrix elements that satisfy three constraints, namely boundary conditions, continuity, and monotonicity. Boundary condition constraints require warping paths to start and finish computations diagonally opposite cell corners of the matrix, with 1 = (1,1) and = ( , ).
The formula for a warping path that has a minimum distance between two time series is as follows: The calculation of the distance to the DTW is carried out using 3 linkage methods, namely "single linkage", "complete linkage", and "average linkage" [18]. Illustration of the three linkages as shown in Figure 4.

Perform Cophenetic correlation calculations.
The Cophenetic correlation coefficient is the correlation coefficient between the original elements of the dissimilarity matrix (Euclidean distance matrix) and the elements produced by the dendrogram (Cophenetic matrix based on distance measures and the connectedness method used) [19] [20]. The formula for calculating the Cophenetic correlation coefficient is as follows: with, ℎ : Cophenetic correlation coefficient : Distance of the i-th and k-th euclidean objects : Average : The distance of the i-th and k-th cophenetic objects : Average The value of ℎ ranges between -1 and 1. The closer the value is to 1, the better the solution resulting from the clustering process. The distance measure of DTW with the connectedness method that produces the largest cophenetic value is the best solution for the hierarchical clustering method.

Perform non-hierarchical clustering using the k-means algorithm.
Non-hierarchical clustering using the k-means algorithm is a method commonly used in data analysis to group objects into adjacent groups based on feature similarities [21]. K-means is a non-hierarchical clustering method that divides data into one or more groups [22]. This method divide data into several groups so that data with characteristics are grouped into one group. Data with properties are placed in the following groupings.

Perform silhouette coefficient calculations to see the accuracy of the k-means
clustering.
The following is the silhouette coefficient calculation formula [23]:

DATA DESCRIPTION
Before carrying out the process of calculating data, the first step that must be taken is to collect data. The data used in this study is secondary data obtained from the Central Statistics Agency

Time Series Clustering
The time series cluster analysis in this study aims to classify provinces in Indonesia based on the amount of fish caught in the sea (tons). The first step taken in this analysis is to create a Dynamic Time Warping (DTW) distance measurement matrix, as shown in Table 2.
The shortest distance based on the Dynamic Time Warping (DTW) distance is between the Province of Kep. Bangka Belitung and West Sumatra Provinces with a distance value of 22191.
Therefore, the first time classifying the application of DTW distances was carried out by clustering the Kep Provinces. Bangka Belitung and West Sumatra Province.

Hierarchical Clustering
The next step is to describe a hierarchical time series cluster analysis dendrogram with Dynamic Time Warping (DTW) distances and the application of various linkage methods ("single linkage", "complete linkage", and "average linkage"). The results of the delineation and clustering of the dendrogram are as follows. Based on the cluster analysis dendogram clustering with the distance measurement of the DTW "single linkage" and "average linkage" linkage methods, the results of the clusters formed are as many as 2 clusters ( Figure 6 and Figure 8). Meanwhile, the "complete linkage" method produces 3 clusters (Figure 7).
The next step in this study is to calculate the cophenetic correlation values as shown in Table 3.  Table 3, the highest cophenetic correlation value (0.8836320) is obtained using the average linkage distance measure. Thus, based on the analysis of the clustering time series (clustering time series) hierarchical method, the optimal cluster formed is 2 clusters. The distribution of provinces in the 2 optimal clusters of the hierarchical method is:

Non-Hierarchical Clustering
In non-hierarchical clustering, one way to estimate the quality of the clusters formed by the k-means algorithm can be seen from the average silhouette value approach.

DISCUSSION
The background for conducting this study is that Indonesia has extraordinary potential in marine resources such as fish production. However, it can be seen that fish production in Indonesia sometimes increases or decreases. This can be influenced by several factors, namely natural influences such as climate, sea waves, inadequate management of marine resources, and the lack of an equal distribution of facilities to support increased fish production in Indonesia. Therefore, this study was conducted with the main objective of knowing the clustering in each province in Indonesia, based on the results of captured fish production in 2017-2019. Besides that, it is to see how clusters are formed based on the amount of fish production in each province. This research can be said to research with a breakthrough because previous research used data on the value of rice production for clustering provinces in Indonesia with the result that the clusters formed were 3 clusters [25]. In this study, using hierarchical and non-hierarchical cluster analysis, the results showed that there were 2 clusters formed where cluster 1 showed 3 provinces that had a high amount of fish production, while cluster 2 showed 31 provinces that had a lower amount of fish production than cluster 1. This indicates that Cluster 1 can be used as an example for the 31 provinces in Cluster 2, including finding out how to increase fish production in Cluster 1 or exchanging information about techniques or tools used to increase fishing in the sea. With the formation of these 2 clusters, it is hoped that the government or parties related to fishing activities will pay more attention to the 31 provinces in Cluster 2 with a lower number of fish production than the 3 provinces in Cluster 1. Actions that can be taken are to provide and distribute facilities that support fishing activities, provide insight or direction to the 31 provinces in Cluster 2 based on what is implemented in the 3 provinces in Cluster 1, and also make more targeted policies to increase fish production in the 31 provinces in Cluster 2. Non-hierarchical clustering with two clusters produces the same distribution of province members as group members in hierarchical clustering. Cluster 1 (3 provinces) is an area with a high fish production category. While cluster 2 (31 provinces) is an area with a low fish production category.

Analysis
The results of the two clusters formed on average obtained a silhouette coefficient value of 0.64, which means that the clustering is categorized as Good Classification.