A data mining method for service marketing: A case study of banking industry

Article history: Received February 15 2011 Received in Revised form April, 8, 2011 Accepted 9 April 2011 Available online 10 April 2011 One of the most important objectives of any modern organization is to gain competitive advantage of customers' data. In order to find hidden patterns or models from data, application of modern and steady methodologies is a necessity. Banking industry is not exceptional from this trend and they may often wish to make more profit by providing appropriate services to potential customers. Analyzing databases to manage customer behaviors seems difficult since databases are multi-dimensional, comprised of monthly account records and daily transactional records. Therefore, to analyze databases, we propose a methodology by considering human factors and building an integrated data utilization system. Moreover, self-organizing neural network map is used to identify groups of customers based on repayment behavior, recency, frequency, and monetary behavioral scoring predicators. We also perform more analysis using Apriori association rule to make marketing strategies for services used by banks. © 2011 Growing Science Ltd. All rights reserved.


Introduction
The new target of database marketing in banking and financial services is to provide the right product to the right customer at the right time (Cohen, 2004).However, a realistic and impressive execution of this goal is not easy to achieve.What has made this idea intricate is that companies have multiple products and operate under a complex set of business constraints.Therefore, what is knotty is to choose appropriate product to offer to a group of customers to maximize the marketing return on investment and meet the business impediment (Cohen, 2004).The aim of data mining is to obtain useful, non-explicit information from data stored in large repositories (Frawley& Piatetsky, 2001).
Not only can data mining improve decision making by searching for relationships and patterns from extensive data collected by organizations, but also it can reduce information overload (Premkumar et al., 2001) and it is often applied to extract and uncover the hidden truths behind very large quantities of data (Ngai et al., 2010).
This paper outlines a framework for solving this problem.The banking industry regularly mounts campaigns to improve customer value by offering new products to existing customers.In recent years, this approach has gained significant momentum because of the increasing availability of customer data and the improved analysis capabilities in data mining.We try to present a solution, which answers the question of what products in banking industry, if any, to offer to each customer to maximize the marketing return on investment.
In order to increase the admission rate of accepting right bank products via right customers, based on a framework we will categorize customers by using self-organizing map (SOM), which transform customers to homogeneous clusters, in a way that we could explore the interrelationships of used services for each cluster.

Data mining in banking industry
Data mining has become a widely accepted process for organizations to improve their organizational performance and gain a competitive advantage and as a relatively new concept; it has been defined in various ways by various authors in recent days.
Data mining is used as a tool to realize automatic collection, automatic transmission, integrated query and analysis via integrating and appraising information from customers.It is also called as knowledge discovery in database (KDD) (Turban et al., 2007).Besides, data mining can be defined as a process of identifying interesting patterns in databases, which can be used in decision-making (Bose & Mahapatra, 2001).Moreover, in this process useful information and knowledge, which are implicit and unknown in advance, are extracted from a number of uncompleted, noisy, vague, and random data of the practical application.Data mining can also be defined as a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to obtain and identify valuable information and subsequently gain knowledge from a large database (Turban et al., 2007).There are some differences among all the descriptions of data mining given but there is a common issue, which is to extract important information from existing data and enable better decision making throughout an organization.In recent years, a number of data mining models and frameworks associated with banking industry have been developed by different organizations.Table 1 summarizes some the most popular ones.Based on the results reported on Table 1, we can observe that a framework which organizes the customers' data based on their profile and transactions is one of the most popular ones and this framework is adopted for the propose of this study.Furthermore, we adopt the best strategy for service marketing based on the rules between the services used by the customers in a specific cluster based on association rules.However, to categorize customers, we need to describe clustering and its indicators which will be defined in the next subsection.

Marketing segmentation
Enterprises can choose only those customers who meet certain profitability criteria based on their individual needs or purchasing behaviors instead of targeting all customers equally or providing the same incentive offers to all customers (Dyche & Dych, 2001).In fact, companies segment the markets based on specific criteria of their customers and one of the most commonly way for market segmentation is clustering.By grouping several similar vectors, clusters are formed.Vectors within the same cluster include several similar features while vectors belonging to different clusters have their own features.Therefore, clusters are the results of market segmentation and clustering is commonly used for market segmentation.In terms of a computer-based clustering approach, it is necessary to transform input data into numerical vectors (Chihli & Chih-Fong, 2008).
Unlike the task of classification, clustering is a data-driven task, which usually uses an unsupervised learning approach (Jain et al., 1999).RFM is one of the most important elements for clustering (Aggelis & Christodoulakis, 2005).RFM is a three-dimensional way of ranking customers to determine the top 20%, or best, customers.It is based on the 80/20 principle where 20% of customers bring in the 80% of revenues.RFM Analysis is a marketing technique that uses three features includes recency, frequency and monetary values of customers to predict whether they are likely to buy again or not.Essentially, RFM analysis suggests that the customer with high RFM score should normally conduct more transactions and lead to higher profit for the bank (Aggelis & Christodoulakis, 2005;Aggelis, 2004).The following features are calculated for this specific period.
• Recency(R) is the date of the user's last transaction.
• Frequency(F) defines the number of financial transactions that user conducted within specific period.• Monetary(M)is the total value of financial transactions that user made within the above stated period.There are two main clusters in terms of visualized market segmentation approaches, The first group is the traditional statistical method, such as the hierarchical cluster analysis (Dillon, 1984).This approach builds a dendrogram by comparing each vector and therefore is able to present the visualized market segmentation process by cutting the dendrogram.The second group is the neural network approach, such as the self-organizing map (SOM) (Kohonen, 1995;Kohonen, 2001), which projects and clusters high-dimensional input vectors into a low-dimension visualized map, usually in forms of two-dimension for visualization.Inspired by the organization of biological neural systems, in which neurons with similar functions are located together, SOM is able to map similar input vectors into the same or similar output units based on two dimensional map.Therefore, output units will self-organize to an ordered map and those output units with similar weights are also placed nearby after training.Until now, most existing data mining approaches have been discovering general rules (Setiono et al., 1998), predicting personal bankruptcy (Desai et al., 1996) and credit scoring (Kim et al., 2004) in bank databases.Few works have studied the mining of bank databases from the viewpoint of customer behavioral scoring.More specifically, we would rather look at both the account data of the customers and their account transactions.With these data, the aim is to discover interesting patterns in the data that could provide clues about what incentives a bank could offer as better marketing strategies to its customers.However, to cluster customer and pattern recognition, we use self-organizing map, which discussed in the next section.

Self organizing map(SOM)
The self-organizing map (SOM), proposed by Kohonen (2001) has been widely used in many industrial applications such as pattern recognition, biological modeling, data compression, signal processing, and data mining.It is an unsupervised and nonparametric neural network approach.The most important characteristic of SOM algorithm lies in its simplicity, which makes it easy to understand, simulate and use in many applications.The basic SOM consists of a set of neurons usually arranged in a two-dimensional structure such that there are neighborhood relationships among the neurons.After completion of training, each neuron is attached to a feature vector of the same dimension as the input space.By assigning each input vector to the neuron with the nearest feature vector, the SOM is able to divide the input space into regions with common nearest feature vectors.Clustering algorithms attempt to organize unlabeled input vectors into clusters or "natural groups" such that points within a cluster are more similar to each other than vectors belonging to different clusters (Pal et al., 1993).
The SOM consists of M neurons located on a regular low dimensional grid, usually one or two dimensions.Higher dimensional grids are possible, but they are not generally used since their visualization is problematic and the lattice of the grid is either hexagonal or rectangle.
The basic SOM algorithm is iterative where each neuron i has a d-dimensional feature vector w i = [w i1 ,….,w id ].At each training step t, a sample data vector x(t) is randomly chosen from the training set.Distances between x(t) and all the feature vectors are computed.The winning neuron, denoted by c, is the neuron with the feature vector closest to x(t) (Sitao & Tommy, 2003) A set of neighboring nodes of the winning node is denoted as N c .We define h ic (t) as the neighborhood kernel function around the winning neuron c at time t.The neighborhood kernel function is a non-increasing function of time and of the distance of neuron i from the winning neuron c.The kernel can be taken as a Gaussian function (Sitao & Tommy, 2003) with the following, ) ( 2δ (2) Where pos i is the coordinates of neuron i on the output grid and σ (t) is kernel width.The weight update rule in the sequential SOM algorithm can be written as follows, Both learning rate (t) and neighborhood σ (t) decrease monotonically with time.During training, the SOM behaves like a flexible net those folds onto a "cloud" formed by the training data.Because of the neighborhood relations, neighboring neurons are pulled to the same direction, and thus feature vectors of neighboring neurons resemble each other.Data analyzed from clustering of customers will be used as an entry to identify the favorite services of each cluster.However, to find and to understand the relationships among different variables, we need to use associate rules.In the next, associate rules and their indicators will be described in details.

Association rules mining
Association rules (Agrawal & Swami, 1993) are applied to discover relationships between variables in transaction databases.Analyses based on association rule mining have been conducted on a wide variety of datasets and they are particularly useful for the analysis of big datasets.
Given a non-empty set I, an association rule is a statement of the form B A ⇒ , where The set A is called the antecedent of the rule and the set B is named the consequent of the rule.The set I is called the itemset and note that the usage of 'itemset' differs from some other definitions that may consider all subsets of I as 'itemsets' (Wu & Chow, 2003).Association rules are mined over a set of transactions, denoted as , , … ., .The interestingness of an association rule is commonly characterized by functions called 'support', 'confidence ' and 'lift'(McNicholas & Murphyb, 2008).
The notation P (A) represents the proportion of times that the set A appears in a transaction set .Similarly, P(A,B)represents the proportion of times that the sets A and B coincide in transactions and P(B|A) =P(A,B)/P(A)denotes the proportion of times that the set B appears in all of the transactions involving the set A. So If A and B are two different items, A→ B is an associate rule that include functions listed below: • Support means the number of times where the rule A→ B appears in data transaction.
• The minimum support of the rule, which is the minimum support of the rule is defined by user in advance.For example a Support=50%means that A and B are bought in 50 percent of transactions.
• Confidence shows the number of times that if the antecedent of the rule happens, the consequent of the rule will occur too.Confidence shows the validity of the rule.For example, Confidence=85% means that in 85%of the cases who buy A will buy B too (McNicholas & Murphyb, 2008).Also, like minimum support, Minimum confidence, which is the minimum support of the rules, is defined by user in advance: • Lift is used as an indicator to appraise how important is the existence of one rule.The formula to estimate lift is represented by Lift= Confidence ( Y X ⇒ )/ Support (Y) or as Eq. ( 4) as follows, .) ( ). ( If lift is greater than one, it will be more likely that X and Y happen simultaneously than independently.In addition, if lift is equal to 5, it shows that if X is in the item list, 5Y times more it will be put in the item list than when it is not in the item list.

Research methodology
To discover associate rules among services of banks and offering services to right customers, we use the data of an Iranian bank named Parsian as one of the best private banks in Iran.This is one of the pioneer private banks, which started its activities in 2001.

Research questions
We try to find the relationship among indicators and make a new knowledge by discovering the information and scrutinizing the pattern behavior of past customers.This study uses a systematic approach to answer the research questions.Therefore, the framework of this research is based on two sectors.First, we attempt to segment the customers and then we try to discover the associate rules among services in each cluster.The following summarizes the necessary questions of this proposed study, i.How could we provide a suitable service for potential customers with data mining techniques?
ii.How will be segmentation of customers based on data mining intelligent models?

Analyzing data
Behavioral variables are used to analyze the behavior of customers and they are considered as input for SOM technique.In neural network, indicators should be first normalized.As describe in 2.2, self organizing map (SOP) is used as a technique to cluster customers.Customers will be segmented based on their characteristics.We cannot only use this segmentation to find preference of customers but also find common characteristics of customers.Nevertheless, indicators must be normalized in the network and the normalization can be done using max-min technique based on Eq. ( 5).This method is based on the distance between minimum and maximum dots and make this measurement by the difference between these two dots.
Normalizing variables to segment customers are implemented using SPSS Clementine and SOM module.The output of network is 5*3 matrixes in which customers are clustered based on the average of each section of matrix and on their common characteristics.Clustering of customers in this part helps us evaluate characteristics of similar customers and find the preference of each group (Table3).

Table 3
Clustering of customers found from SOM In the next, to find the best cluster, which has more customers and the best average grade of financial variables with the highest frequency, we investigate the behavioral pattern and discover the associate rules of each cluster.Therefore, cluster with coordinate X=0, and Y=2 which consists of 981 is considered superior to the others and it is chosen for our investigation case.

Identification of associate rules
In this section, Data analyzed from clustering of customers will be used as an entry to identify the favorite services of each cluster.To discover the relationship between different groups of customers and their favorite services, Apriori algorithm will be used (Aggelis, 2004).One of the earliest algorithms used to find the association rules is Apriori algorithm (Aggelis, 2004).The algorithm is an influential method for mining frequent item sets according to boolean association rules.In this stage, before analyzing customers of each cluster, customers will be put in two groups of test and supervisory.Test group is used to evaluate the correctness of relationship of associate rules and verification of results.Therefore, based on the cluster found in section IV, 80% of customers are used as data of supervisory and 20% of them as data of test group.Based on data of supervisory group, the Apriori algorithm with support level of 20% and confidence level of 70% is used to identify the relationship among services used in our selected cluster.Some of these rules are summarized in Table 4.The discovered rules demonstrate the relationship of different services for our selected cluster.(Min & Han, 2005).MAE is evaluated viaEq.(2).In this equation, P i is suggestion of system or predicted values by system and T i is actual value of data.We have two different groups: Test and Supervisory.Test Group is a group of customers who did not enter the favorite services after clustering.Therefore, to obtain the MAE, the sums of the differences between the absolute values of the predicted data of the association rules in one cluster with the actual results are computed.Finally, you can average these errors across all compared items.Consequently, to estimate MAE, first, based on the discovered services for this cluster of customer in supervisory database, some suggestion from system for this group are presented.Next, the MAE values for test and supervisory groups are estimated via Eq.( 6) as follows, .
The results shown in Therefore, we can conclude that, the framework designed to present a favorite service to potential customers is an acceptable one.

Conclusion
In this paper, we have presented a data mining approach to study the behavior of bank industry customers.The study consists of two groups of variables, 30 variables for investigating the behavioral pattern of customers were found and by using Kohonen neural network, the output neuron of 5*3 matrix was extracted and customers were perched into 13 clusters.This clustering was based on behavioral pattern of each cluster where one cluster with coordinates X=0 and Y=2 among all which consisted of 981 customers and with variable R: 0.453,F: 0.5 and M: 0.446 were selected to investigate more to recognize the associate rules among all services of this group.In the second phase of the research, we look to find out about the present suitable service to potential customer.Our results indicate that 12 out of 21 services were used more often and the results were also approved by associate rule.
Test Group Supervisory Group Test 0.492308 0.493885 MAE

Table 1
Different models 3.2 Research constructsIn this research, three group indicators are used.iii.Profile data: This indicator consists of sex, age, education, marital status, job, etc. iv.Presentation of services via bank.Such services are available in e-banking business.v.Financial transactions of banks: These services include receipt of withdrawing money in a specific period via customers.All indicators are listed in Table2.

Table 4
Rules made from services used by selected cluster Based on results found from rules, we found that customers prefer 12 services to other services, which are listed below:To evaluate the correctness of recommendation, we use MAE statistic variables on test and supervisory.The mean absolute error (MAE) is the average of the absolute value of the residuals (error).The MAE is similar RMSE but it is less sensitive to large errors table 6 indicate that the difference between the test and supervisory groups is trivial (0.493885-0.492308=0.001).