Comparing clustering models in bank customers : Based on Fuzzy relational clustering approach

Article history: Received December 5, 2015 Received in revised format February 16 2016 Accepted August 15 2016 Available online August 16 2016 Clustering is absolutely useful information to explore data structures and has been employed in many places. It organizes a set of objects into similar groups called clusters, and the objects within one cluster are both highly similar and dissimilar with the objects in other clusters. The K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms are the most popular clustering algorithms for their easy implementation and fast work, but in some cases we cannot use these algorithms. Regarding this, in this paper, a hybrid model for customer clustering is presented that is applicable in five banks of Fars Province, Shiraz, Iran. In this way, the fuzzy relation among customers is defined by using their features described in linguistic and quantitative variables. As follows, the customers of banks are grouped according to K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms and the proposed Fuzzy Relation Clustering (FRC) algorithm. The aim of this paper is to show how to choose the best clustering algorithms based on density-based clustering and present a new clustering algorithm for both crisp and fuzzy variables. Finally, we apply the proposed approach to five datasets of customer's segmentation in banks. The result of the FCR shows the accuracy and high performance of FRC compared other clustering methods. Growing Science Ltd. All rights reserved. 7 © 201


Introduction
Clustering has been a widely studied problem in the machine learning literature (Filippone et al., 2008;Jain, 2010).Clustering algorithms have been addressed in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern recognition.The prevalent clustering algorithms have been categorized in different ways depending on different criteria.As with many clustering algorithms, there is a trade-off between speed and quality of the resulting results.The existing clustering algorithms can be simply classified into two categories, hierarchical clustering and partitioned clustering (Jain, 2010;Jiang et al., 2010;Feng et al., 2010).Clustering can also be performed in two different modes, hard and fuzzy.In hard clustering, the clusters are disjoint and nonoverlapping in nature.Any pattern may belong to one and only one class in this case.In the case of fuzzy clustering, a pattern may belong to all the classes with a certain fuzzy membership grade (Jain, 2010;Pedrycz & Rai, 2008;Peters et al., 2013).Hierarchical clustering algorithms iteratively build clusters by joining (agglomerative) or dividing (divisive) the clusters from the previous iteration (Kannappan et al., 2011;Chehreghani et al., 2009).The agglomerative approach starts from the finest clustering with one of the n 1-element clusters given n objects and finishes at the most coarse clustering, with one cluster consisting of all n objects.The divisive approach works in another way, from the coarsest partition to the finest partition.
The resulting tree has nodes created at each cutoff point that can be used to generate different clustering.
There is an enormous variety of agglomerative algorithms in the literature: single-link, complete-link, and average-link (Höppner, 1999;Akman, 2015).The single-link algorithm or nearest neighbor algorithm has a strong tendency to chain in a geometrical sense and not balls, an effect which is not desired in some applications; groups which are not quite well separated cannot be detected.The complete-link has the tendency to build small clusters.The average-link algorithm builds a compromise between the two extreme cases of single-linkage and complete-linkage (Eberle et al., 2012;Lee et al., 2005;Clir, & Yuan, 1995).Contrary to the agglomerative algorithms, divisive algorithms start with the largest clustering, i.e., the clustering with exactly one cluster.The cluster will be separated into two clusters in the sense that one tries to optimize a given optimization criterion (Ravi & Zimmermann, 2000;Garrido, 2011).The popular clustering algorithms has been widely used to solve problems in many areas, for instance the K-mean is very sensitive to initialization, the better centers we choose, the better results we get (Khan & Ahmad, 2004;Núñez et al., 2014), but has some of weakness and we can't use this algorithm everywhere and this algorithm can't get crisp, fuzzy and linguistic variables together.Regarding this, in this paper, we propose a new algorithm based on fuzzy variables and fuzzy relation called Fuzzy Relation Clustering (FRC) algorithm.
The organization of the remainder is as follows: section 2 reviews clustering algorithms.Section 3, present the Fuzzy variables and Fuzzy relation clustering (FRC) algorithm.Section 4 briefly introduces the three internal validity indices and the external validity indices.Section 5 describes the dataset.In section 6, we present the output of the four clustering algorithms.At the end, a concluding remark is given in section 7.

k-mean
K-mean algorithm is an effective and easy algorithm for clusters in data sets (Lee et al., 2005).The process of the K-mean algorithm is as follows:


First stage: the user is asked how many cluster k's are formed in data sets.


Third stage: for each record, find the nearest center cluster; to some extent, we can say the center cluster itself is a subset of records.In other words, partition representation separation of data collection, thus we have k cluster C1,C2,…,Ck  Fourth stage: for each k cluster, search center bunch and center.Update the station of each cluster to the new value of center.


Fifth stage: continue stages 3 to 5 until reaching convergence or end.


Usually Second stage: allocate record k to the first station of center cluster randomly.
The nearest criterion is Euclidean distance in stage 3, although the other criterion may have a better application.Suppose that we have n point data, (a1,b1,c1), (a2,b2,c2),…,(an,bn,cn).The center of these points is compared with the center of gravity of these points and put the situation , for example , points (1,1,1),(1,2,1),(1,3,1) with center (2,1,1) : End of algorithms, while that center has very few changes.In other words, the end of the algorithms, while that for all clusters C1, C2… Ck.Obtain ownership of all the records by asking whether each center will remain a cluster in that cluster; also, although the algorithms finish, some of the convergence criterion is obtained.The algorithm ends when a certain convergence a criterion is viewed as being a major reduction in the total square error is not present: where i p c  denote each point of the data in cluster i and center cluster mi.As was observed, k-means algorithm does not guarantee that the global minimum SSE will be found, instead, it is often placed in a local minimum.Increasing the chances for reaching the global minimum, analysis should be used for the initial cluster centers algorithm with different values.The main point is to first select place of cluster's centers in the first stage in the random form.Secondly, for the next stage the cluster's centers may be far from the first centers.One of the potential problems in employing k-mean algorithm is who decides how many clusters should be found, unless the analyst has previous knowledge about the number of fundamental clusters.In this state, there may be an increase in an external loop to algorithms.The loop from different probable quantities k. can then compare the solution of clustering for each value of k, then the value of k that has a minimum of SSE.

C-mean
C-mean algorithm is used for hard clustering approaches, meaning that in this way, each data is allocated just to one cluster (Filippone et al., 2008), define a family from collections on the source collection "X" in this form Ai,i=1,2,.....,c .,

 
 If the greatest value of (6) difference match elements of matrix and repeat stage 2.

Fuzzy C-mean
This algorithm, offered by Schölkopf et al. (1998) is a skilled algorithm for fuzzy clustering of data and, in fact, is developed to the form of mean clustering c.For the development of this algorithm in clustering, define a family of fuzzy collections in form under title a fuzzy separation (division) on a source collection.Now, present the algorithm for assigning fuzzy c-mean for clustering of n data in c cluster.For this work an aim function m J in an objective function, we define as follows.
So that ik d is the Euclidean distance between center of cluster I and data k.
So that ik  is equal to membership degree of data k divided to cluster i.The least value of m J will connect to the best clustering state.Here, a new parameter (m) introduced by the name of parameter of weight, in which the changes Interval is in form . This parameter is a distinct degree of fuzzy in clustering progress.Also, a similar previous state is marked as the center coordinates of bunch i, so , so that m is the number of i V distances or is numbers of criterion similar to center coordinates of the bunch obtained from the relation shown below.
. Thus, in this algorithm when optimum separation fuzzy is obtained J is minimized in the bottom relation.
The Fuzzy C-mean algorithm is as shown below: and selects a value for m'.Suppose the first separation matrix , each time this algorithm is distinct with r, ,...
 Update the separation matrix for r repetition , then finish the calculation and in this form return to stage 2.

Kernel k -mean
Given the data set X, we map our data in some feature space , by means of a nonlinear map and we consider k centers in feature space ) ,...., 1 , ( , Feature Space Codebook, since in our representation the centers in the feature space play the same role of the code vectors in the input space.In analogy with the code vectors in the input space, we define for each center  i V its Voronoi region and Voronoi set in feature space.The Voronoi region in feature space V is the set of all Vectors in  for which  i V is the closest vector (Filippone et al., 2008): The Voronoi set in feature space The set of the Voronoi regions in feature space define a Voronoi Tessellation of the feature space.
The Kernel K-means the algorithm has the following steps:  Project the data set X into a feature space , by means of a nonlinear mapping .
 Initialize the codebook ) ,...., ( 1  Go to step 3 until any  i v changes. Return the feature space codebook. This algorithm minimizes the quantization error in feature space.Since we do not know explicitly, it is not possible to compute Eq (10) directly.Nevertheless, it is always possible to compute distances between patterns and code vectors by using the kernel trick, allowing the Voronoi sets in feature space   i to be obtained.Indeed, writing each centroid in feature space as a combination of data vectors in feature space, we have: where jk  is one if x  and zero otherwise.Now the quantity: This is the closest possible analog vector space model to provide a combination of   i coefficients for each update.Repeat this process until there are two possibilities and   i get the votes to change the active compound Voronoi space.
An on-line version of the kernel K-means the algorithm can be found in Clir and Yuan (1995).A further version of K-means in feature space has been proposed by Garrido (2011).In his formulation, the number of clusters is denoted by c, and a fuzzy membership matrix U is introduced.Each element ih u denotes the fuzzy membership of the point h x to the Voronoi set   i .This algorithm tries to minimize the following functional with respect to U: The minimization technique used by Garrido (2011) is deterministic annealing, which is a stochastic algorithm for optimization.A parameter controls the fuzziness of the membership during the optimization and can be proportional to the temperature of a physical system.This parameter is gradually lowered during the annealing, and at the end of the procedure, the memberships have become crisp; therefore, a tessellation of the feature space is found.This linear partitioning in F, back to the input space, forms a nonlinear partitioning of the input space.

Fuzzy Relation Clustering (FRC)
This section describes the details of the computational model used for FRC algorithm.At first, it is important to note that first there is an overview of the fuzzy variables.The algorithm itself is fully unaware of the concept of customer clustering of bank, then we describe the FRC algorithm.

Fuzzy variable
Many sentences in natural language express numerical sizes such as good, hot, short, young, and etc, which should be considered as a numerical scale for better understanding (Liang et al., 2005).Making a set of amounts to be constant; if A x , then x is high and if A x , then x is not high.This process was used in traditional systems.The problem of this process is that "this is so sensitive about lack of accuracy in numerical data or its variation.In order to consider that part of no numerical information, a syntactic representation is necessary.Verbal terms are the variables which are tighter than fuzzy variables, because they accept fuzzy variables as their own amounts.The fuzzy variables, their amounts, words or sentences in one language are natural or artificial.For example, the temperature of a liquid reservoir is a fuzzy variable if it allocates amounts such as cool, cold, hot and warm.Age can be a fuzzy variable if its amounts are old, young, and etc.We can conveniently see that fuzzy variables provide a suitable tool for optimal and approximate description of complicated phenomena.

Fuzzy relation
The proposed model for market segmentation is based on fuzzy relation.The key concepts in fuzzy relation are reviewed as follows:

Fuzzy equivalence relation
A fuzzy relation R on X X  is called a fuzzy equivalence relation if the following three conditions are met (1) Reflexive, i.e., The transitive closure, T R , of a fuzzy relation R is defined as the relation that is transitive, contains R and has the smallest possible membership grades.Theorem 1 (Zimmermann, 1996) Let R be a fuzzy reflexive and symmetric relation on a finite universal set X with n X  , then the max-min transitive closure of R is the relation to Theorem 1, we can get the algorithm to find the transitive closure .

Algorithm
Step 1: Initialize And stop.Otherwise, go to step 3. Step3: and stop Otherwise, go to step 2.

Fuzzy relation segmentation principle
The  -cut set of fuzzy relation,  R defined as: An equivalence relation of a finite number of elements can also be represented by a tree.In this tree, each level represents an  -cut of the equivalence relation (Zimmermann, 1996).

Customer segmentation
In this section, we will explain the different types of market's features and formulate fuzzy equivalence relation among markets.Then place them in groups according to similarity of their features.

Customer Features
These features are expected to cause the opinion and adjustment of market about received product or service and they are categorized in three variable sets, while these are binary, quantitative and linguistic variables.

 
where, m is a number of markets and 1 n is number of binary variables.The relation among markets according to the binary feature is defined as classical relation with 0 or 1 quantity.If these features are more than one, then fuzzy relation with quantity between [0, 1] will be defined.
where, 2 n is the number of quantitative variables.The relation among markets according to the quantitative feature depends on the distance measure of their values.Decreasing this distance makes costumer's relation strong, and vice versa.The linguistic variables, 2 ,..., , 2 , have words or sentences in a natural or artificial language values, which are shown by fuzzy numbers.The vector of linguistic variables,V , is where, A : Value of j-th linguistic variable, ) ,..., 2 , 1 ( The relation among markets according to a linguistic feature depends on the distance measure of their fuzzy number values.We utilize Chen and Hsieh's (Rose, 1998) modified geometrical distance algorithm based on the geometrical operation of trapezoidal fuzzy numbers.Based on this algorithm, the distance between two trapezoidal fuzzy numbers, ) , , , ( , and is:

Customer Relations
We can get three fuzzy relation matrices, q p R R , and v R from vectors Q P, andV , frequently.
1 2 . In fuzzy relation matrices quantities between market i and j , are as follows: where, . With these three matrices we can construct final fuzzy relation matrix R by the following equations: where, p W is weight of p R , q W is weight of q R and v W is weight of v R .

Market segmentation
The fuzzy relation matrices, q p R R , and v R are reflexive and symmetric because: If these relations not are transitive, we can obtain transitive closure relation according to section (3.2).
Then we can define relation R as an equation and make use of the fuzzy relation clustering principle to the markets segmentation according to their similarity (see section 3.2).

Measures for evaluation of the clustering quality
Validity of clustering algorithms based on qualitative assessment of clustering is a way to resolve the issue.Generally there are three approaches for validating clustering algorithms.The first approach is based on internal criteria; external criteria on the second and the third approaches are relative criteria.
The following briefly describes each of these three approaches.
 Internal criteria: The evaluation criteria categories are the clusters in the real structure.The aim of these criteria, the quality of clustering in real environments is derived from knowledge of clustering. External criteria: Validation of these criteria based on the comparison between the clustering with the clustering is done correctly.The evaluation of clustering algorithms to identify the performance on database is important. Relative criteria: The basis of these criteria is evaluation structure of base algorithms, with different input clustering algorithms.
In this paper, we use the internal criteria and external criteria to choose the best algorithms among Kmean, C-mean, Fuzzy C-mean and Kernel K-mean.For more details regarding internal and external criteria, the reader may refer to Aliguliyev (2009).Various cluster validity indices are available in the literature (Zhao & Karypis, 2004;Wu et al., 2009).
In internal criteria and external criteria measures, we used five indices, Below, we briefly introduce these indices.
 Purity: The purity gives the ratio of the dominant class size in the cluster to the cluster size itself.A large purity value implies that the cluster is a ''pure" subset of the dominant class. Mirkin: This metric is obviously 0 for identical clustering's, and positive otherwise. F-measure: The higher the F-measure, the better the clustering solution.This measure has a significant advantage over the purity and the entropy, because it measures both the homogeneity and the completeness of a clustering solution  V-measure: The V-measure is an entropy-based measure that explicitly measures how successfully the criteria of homogeneity and completeness have been satisfied  Entropy: Since the entropy considers the distribution of semantic classes in a cluster, it is a more comprehensive measure than the purity.Unlike the purity measure, an entropy value of 0 means that the cluster is comprised entirely of one class, while an entropy value near 1 implies that the cluster contains a uniform mixture of all classes.The global clustering entropy of the entire collection is defined to be the sum of the individual cluster entropies weighted according to the cluster size. Resultant rank: the Resultant rank is Statistical method showing the clustering algorithms ranks based on above indices.
In the next section we compare the output of popular clustering algorithms (K-mean, C-mean, Fuzzy C-mean and Kernel K-mean) and fuzzy relation clustering algorithm based on four dataset of customers segmentation in banks of Fars Province, Shiraz, Iran.

Dataset
To compare and evaluate the output of clustering algorithms, we used the dataset of customer's segmentation in five banks of Fars Province, Shiraz, Iran.The datasets of the banks have standards for comparison among the clustering algorithms of this research.In Table 1 we describe characteristics of data set for each bank of these datasets.
Table 10 shows accuracy and high performance of FRC compared to other clustering methods, thus from this table it can be seen that the distance rating is very high compared to other clustering methods.

Conclusions
In this paper, we surveyed five clustering algorithms.The comparison was conducted on the banks standard dataset with widely varying numbers of clusters of Fars Province, Shiraz, Iran.The quality of a clustering result was evaluated using three validating clustering approaches: internal criteria, external criteria and relative criteria.
Regarding validating clustering approaches we found the popular clustering algorithms can't dive both crisp and fuzzy quantity variables.Based on the weak point of popular clustering algorithms we define a new clustering algorithm called FRC.In FRC, we have defined three relation matrices for binary, numeral quantities and fuzzy attributes.We proposed a FRC clustering algorithm according to object's features by fuzzy relation clustering principle.This algorithm can use different features with crisp or fuzzy quantities.These features are categorized into three variable sets, consisting of binary, quantitative and linguistic variables.
In the final analysis, the best clustering algorithm has been determined by calculating validating clustering.By calculating validating for each algorithm, considering effective feature, we realized that each of these algorithms can present the suitable clustering in these algorithms, and there are surveys which make definite and fuzzy values possible simultaneous for bank customers.

Fig. 1 .
Fig. 1.Membership fumction of a trapezoidal fuzzy number The binary variables,