Implementation of k-Medoids Clustering Algorithm to Cluster Crime Patterns in Yogyakarta

The increase in crime from day to day needs to be a concern for the police, as the party responsible for security in the community. Crime prevention effort must be done seriously with all knowledge that they have. To increase police performance of crime prevention effort, it is necessary to analyze crime data so that relevant information can be obtained. This study tried to analyze crime data to obtain relevant information using clustering in data mining. Clustering is a data mining method that can be used to extract valuable information by grouping data into groups that have similar characters.The data used in this study were crime patterns which were then grouped using K-medoids clustering algorithm. The obtained results in this study were three crime groups, namely high crime level with 4 members, medium crimelevel with 6 members and low crime level with 8 members. It is expected that this information can be used as material for consideration in crime prevention effort.


Introduction
Crime is any act that is prohibited by public law to protect the public and given punishment by the state.These acts are punished because they are violating the social norms such as act that conflict with legal norms, social norms and religious norms that applied in the society [1].The existence of punishment applied by law enforcement does not make the criminals undermine their intentions, and in fact criminal in Yogyakarta are increasing widespread.
The increase of criminal cases in the society can result in losses both materially and immaterially.For this reason, efforts are needed from law enforcement to reduce crime in the society.Such efforts can be done by finding relevant information related to crime.Such information can be obtained by processing and analyzing crime data owned by the police.
The crime data owned by the Yogyakarta Police is still stored in the manual form such as register books and excel.The data is only stored and is not used to produce any information.Where the data can be processed and analyzed to produce valuable information in efforts to prevent crime.Data mining is aproper technique to extract important information from a data set.
Crime data owned by the police can be processed using data mining to become crime patternsthat represent relationship between crimes.The research was successfully done by Atmaja [2], the result was crime patterns presented in graph form.The weakness in that study isthat there is no clear grouping on crime level form generated crime patterns.
This study tried to refine previous research by groupings crime patterns into three categories, namely high crime level, medium crimelevel and low crime level.
Clustering is one of the data mining techniques that aims to group data based on information found in the data [3].The grouping is based on the similarity between data so the data in the same cluster is homogeneous.Thus clustering is a very appropriate method for classifying crime patterns into high, medium and low crime level.
Researches on implementation of clustering method have been done, as done by Singh et.al. [4].They tried to implement K-means clustering algorithm by using three different distance measurements namely Euclidean, Manhattan and Chebychev.The result is that the implementation of K-means algorithm using Euclidean distance measurements can produce the best group from the other distance measurements.So it can be concluded that the best pair for K-means algorithm is the Euclidean distance measurement.
Research on the use of Euclidean distance in K-means algorithm has been successfully done by Atmaja [5].The aim of his study was to cluster crime data into three categories, namely high, medium and low crime level.Although the objective of the research was achieved, K-means algorithm is classified as an ineffective algorithm because it involves too much noise and outliers caused by the average selection of clusters [6].
This study tried to improve previous study by replacing K-means algorithm with Kmedoids algorithm.K-medoids algorithm is one of the clustering algorithms that are not influenced by outliers or other extreme variables [6].K-medoids work by determining the center point of existing data without performing an average calculation as in Kmeans.The following is the K-medoids algorithm [6]: The result of this studyis crime patterns that have been divided into three groups, namely high, medium and low crime level.It is expected that the police can use this information to improve crime prevention efforts in the society.

Research Methodology
Research methodology done by this research is activity steps to implement K-means algorithm to cluster crime patternsfrom Yogyakarta Police data which are presented in  Figure 2 shows research methodology which began with literature study to study relevant theories related to solve problems.The next step wasdata collecting related to research, in this case the processed data was crime data from Yogyakarta Police.The crime data that has been collected then processed using association techniques in data mining to produce association rules that described crime patterns.Generated rules was used as input to K-medoids algorithm to produce crime patterns accompanied by grouping based on low, medium and high crime level.The next step wasresult analyzing that has been obtained to find out whether the objective achieved or not.Finally, the result analysis will draw conclusions from the research that has been done.Suggestions were also given to correct existing disadvantages to be applied in the future research.

Literature Study Data Collecting Association Rules Generation
Clustering with K-medoids

Result Analysis
Concluding result and giving advice 3 Results and Discussions

Crime Patterns
There are 18 samples of crime patterns as results of association technique processing accompanied by support and confidence.The data will be grouped using the K-medoids algorithm based on variable support and confidence.These data are presented in Table 1.

Determining Initial Medoids
In the first stage, three medoidswere randomly selected from data sample in Table as shown in Table 2.

Calculating Euclidean Distance Iteration 1
The next step is euclidean distance calculation from each data to the three selected medoids.Euclidean distance is calculated based on the following formula [6]: Here, (, ) represents distance between data andmedoid,  1 denotes support value in each data,  1 ismedoid (c) for support,  2 denotes confidence value in each data and  2 ismedoid (c) for confidence.Table 3 presents results from euclidean distance calculation on each data along with medoid information which has the shortest distance to the data.

Calculating Total Cost Iteration 1
Calculating total cost is the final step from iteration 1, by summing the shortest distance from data in Table 3, so the total cost is 2.734.

Determining Random Medoids Iteration 2
The process continues to iteration 2 by selecting a new random medoid from the data to replace the medoid C3 temporarily.The selection of a new medoid should not be the same as one of the medoids that has been selected.Table 4 shows three medoids for iteration 2.

Calculating Euclidean Distance Iteration 2
After a new medoid has been determined, the next step is to recalculate the euclidean distance for each data based on three medoids from Table 4.The results isshown in Table 5.

Calculating Total Cost Iteration 3
Calculating total cost is the final step from iteration 3, by summing the shortest distance from data in Table 7.So the total cost is 2.510.To determine the next iteration, total cost from iteration 3 is compared with iteration 2, which is 2.510 > 2.416.
Because the total cost of iteration 3 is greater than iteration 2, the iteration stops.

Results
Each medoid represents 1 group of crime level based on support and confidence.C1 as medium level crime and 8 rules classified as low level crime.Suggestions that can be given based on the results of this study are: a) There is a need to compare some distance method for K-medoid algorithm.Thus, it can be known the most appropriate distance calculation method for K-medoid algorithm.
b) There is a need to apply weighting mechanism for each variable because not all variables have the same interests and priorities.

Figure 2 .
Figure 2. Research Methodology represents high crime level, C2 represents medium crime level and C3 represents low crime level.The results of crime patterns grouping are shown in Tables 8, 9 and 10.

Table 1 .
Crime patterns

Table 3 .
Rules with euclidean distance

Table 5 .
Rules with euclidean distance iteration 2

Table 7 .
Rules with euclidean distance iteration 3