A hybrid grey based K-means and feature selection for bank evaluation

Article history: Received October 15, 2013 Received in revised format March 2 2014 Accepted May 1, 2014 Available online May 7 2014 Performance measurement plays essential role on improving the performance of business units and their efficiencies. During the past few years, there have been tremendous development in banking systems and the primary focus of many managers is to improve the quality of services for market retention. Performance measurement in banking industry is normally involved with various qualitative as well as quantitative criteria, which leads to the implementation of multiple criteria decision making techniques. This paper presents a hybrid grey relational analysis and K-means to cluster and measure the performance of banking system. The proposed study uses different criteria, clusters banks into various segments and ranks 43 different banks in city of Semnan, Iran. © 2014 Growing Science Ltd. All rights reserved.


Introduction
Performance measurement plays essential role on improving business units' performance and their efficiencies.During the past few years, there have been tremendous development in banking systems and the primary focus is to improve the quality of services as an objective for market retention.Performance measurement in banking industry is normally involved with various qualitative as well as quantitative criteria, which leads to the implementation of multiple criteria decision making techniques.Data mining is the result of applying sophisticated modeling techniques from the diverse fields of statistics, artificial intelligence, and database management (Yuantao & Siqin, 2008;Han & Kambert, 2001).Data mining has been widely used to determine marketing trend (Kaefer et al., 2005), customer detection (Kim & Nick Street, 2004), fraud detection (Farvares & Sepehri, 2010), etc.
Today, the ability to detect the profitable customers, building a long-term loyalty in them and expanding the existing relationships is the primary key and competitive factors for a customeroriented organization.The prerequisite for having such competitive factors is the existence of a very powerful customer relationship management (CRM).The precise evaluation of customers' profitability is one of the most important reasons that lead to a successful CRM programs.RFM is a technique, which scrutinizes three properties, namely recency, frequency and monetary for each customer and scores customers based on these properties.Zalaghi and Abbasnejad Varzi (2014) presented a method, which obtains the behavioral traits of customers using the extended RFM approach and having the information associated with the customers of a firm.It then classifies the customers based on K-means algorithm and finally scores the customers in terms of their loyalty in each cluster.In their method, first the customers' records are clustered and then the RFM model items are specified through selecting the effective properties on the customers' loyalty rate based on the multipurpose genetic algorithm.Next, they are scored in each cluster based on the effect that they have on the loyalty rate.

K-means clustering
K-means clustering is a popular data mining clustering method, which aims to partition N observations into K clusters in which each observation belongs to the cluster with the nearest mean.Normal assessment of a proper K is accomplished by minimizing the inner-cluster variation and maximizing the among-cluster variation, simultaneously.K-means clustering is normally sensitive to outliers, so, outliers must be removed before completing clustering (Ying & Feng, 2008;Cheng & Chen, 2008;Farvaresh & Sepehri, 2010).According to Edwards (2003) and Kantardzic (2011), the K-means method used in this paper has the following steps, 1. Choose a primary part of K categories including samples that were randomly selected and calculate the mean of each pair, 2. Create a new section of each part by determining the nearest center core, 3. Calculate the new batches as the main centers, 4. Repeat step 2 and step 3 until the algorithm reaches termination criteria.

Grey Relational Analysis
Grey relation analysis proposed in this paper has the following steps (Deng, 1989;Hsia et al., 2004;Huang et al., 2008;Razi et al., 2013): Consider X 0 as reference and N alternatives with k criteria as follows, Grey relational coefficient are calculated as follows, where 0i X  is the absolute difference between X 0 and X i in k th criterion, 0i . Finally, grey relational degree is calculated as follows, where w j is the weight of criterion j and we may use Finally, all relationships must be normalized as follows, * ( ) min ( ) ( ) max ( ) min ( ) Grey relational analysis has been widely used in various industries.Gupta and Kumar (2013), for instance, presented optimization of performance characteristics in unidirectional glass fiber reinforced plastic composites using Taguchi method and Grey relational analysis.Performance characteristics such as surface roughness and material removal rate in this paper were optimized during rough cutting operation.Salardini (2013) applied AHP and grey relational analysis to offer a method for portfolio management.They used a statistical sample consists of 16 firms whose shares were trading during the fiscal year of 2010 on Tehran Stock Exchange and used AHP and gray relational analysis to assign weight to each firm.
The proposed study of this paper uses a hybrid of Grey relational analysis as well as K-means for clustering 43 banks in city of Semnan, Iran based on 24 criteria.

The results
In this section, we present details of our findings on clustering 43 banks based on 24 different criteria defined in Table 1 as follows,  All computations have been accomplished on Clementine®12 and the results of clustering are summarized in Fig. 1.In order to have an efficient clustering, we also calculated the average silhouette coefficients for various clusters and Table 3 demonstrates the results of our survey.As we can observe from the results of ranking, out of 43 banks, 34 has been located in the first cluster while the second cluster only includes two banks.

Conclusion
In this paper, we have investigated the relative efficiencies of banks in one of Iranian cities called Semnan.The proposed study has applied K-means clustering for ranking various banks based on 15 criteria.The results of ranking can be compared with some other alternative performance measurement methods and we leave it for interested researchers as future studies.

Table 1
The criteria used for clustering banks As we can observe from the results of Table1, there are relatively large numbers of criteria and we use feature selection to reduce the number of criteria from 24 to 15. Table2shows the input data for the reduced numbers of criteria.

Table 3
The summary of the average silhouette coefficientAs we can observe from the results of Table3, the highest value belongs to third cluster and based on this cluster, we rank different banks and the results are summarized in Table4-6 as follows,

Table 5
The results of ranking different banks