Measuring customer loyalty using an extended RFM and clustering technique

Article history: Received December 28, 2013 Accepted 24 March 2014 Available online March 28 2014 Today, the ability to identify the profitable customers, creating a long-term loyalty in them and expanding the existing relationships are considered as the key and competitive factors for a customer-oriented organization. The prerequisite for having such competitive factors is the presence of a very powerful customer relationship management (CRM). The accurate evaluation of customers’ profitability is considered as one of the fundamental reasons that lead to a successful customer relationship management. RFM is a method that scrutinizes three properties, namely recency, frequency and monetary for each customer and scores customers based on these properties. In this paper, a method is introduced that obtains the behavioral traits of customers using the extended RFM approach and having the information related to the customers of an organization; it then classifies the customers using the K-means algorithm and finally scores the customers in terms of their loyalty in each cluster. In the suggested approach, first the customers’ records will be clustered and then the RFM model items will be specified through selecting the effective properties on the customers’ loyalty rate using the multipurpose genetic algorithm. Next, they will be scored in each cluster based on the effect that they have on the loyalty rate. The influence rate each property has on loyalty is calculated using the Spearman’s correlation coefficient. © 2014 Growing Science Ltd. All rights reserved.


Introduction
Target customer selection has been one of the most important issues in customer-based marketing.Therefore, the basis of value making via the customers is to find profitable or potentially profitable customers (Hosseini et al., 2009;Cheng & Chen, 2009;Mak et al., 2011).Today, the ability to determine the profitable, loyal and long-term customers is the primary key success for customeroriented organizations.In order to achieve winning strategies, business owners must be looking for a suitable approach to detect the potential customers and attract them as much as possible.Many studies indicate that most organizations are aware of the role and importance of identifying customers who are valuable for the success of organization.Besides, the results are indicative of the fact that the strategies, which are based on locating and retaining appropriate customers will create significant value.Therefore, customers' segmentation is considered as one of the primary approaches in marketing, which plays significant role in customer relationship management (CRM).CRM is defined the management skills in the organizational level obtained through a deep understanding, participating and managing the customers' requirements and it is based on the knowledge obtained from the customers in the direction of increasing the organizational efficacy and productivity and consequently profitability.Today, utilization of the CRM strategies plays essential role as one of the main motives behind many efforts made by firms to create better value for their customers and have long-term revenue for them.The data extraction tool helps organizations necessary knowledge from customers' data under a CRM framework (Golmah & Mirhashemi, 2012).The concept of customer lifetime (CLV) or loyalty is to measure customers' loyalty in CRM (Chuang & Shen, 2008).CLV is the current value of all profits obtained from customers and it helps the decision makers target the appropriate markets, more effectively.Many researchers recommended numerous models for CLV calculation and utilization.One of the models used for discovering the rules dominant on the customer relationship is recency, frequency and magnitude (RFM) models.After the preliminary investigations on the previous studies on CRM, it was determined that no approach was suggested so far based on feature selection using the multi-objective genetic algorithm (MOGA) to evaluate the customer loyalty rate but no specific criterion was used in these approaches to weigh the attributes.
In this paper, an approach is suggested, which specifies the attributes effective on the determination of the organization's customers' loyalty rate using MOGA whose attributes form the RFM model.The influence rate of each attribute is calculated using the Spearman's correlation coefficient when determining the customers' loyalty.This method can specify the loyal and disloyal customers with a good precision.

The K-means clustering algorithm
Clustering is a data-mining technique, which yields meaningful and informative clusters of objects with similar properties in an automatic mode (Garcia-Murillo & Annabi, 2002).In clustering, the primary objective is to form various groups with similar characteristics and the proposed study of this paper uses the K-means clustering.The purpose of executing K-means is to divide samples into k clusters and K prime centers will be selected randomly from the input data.Then the distance between each datum and each of the K centers are calculated and they are allocated to the cluster with the least distance with its center.After the allocation of all the points to the K centers, the mean of each cluster is calculated as the new center and the calculation of elements' distance and their allocation are continued to new centers until there is no displacement in the clusters elements (Baradwaj & Pal, 2011).

The Spearsman's Correlation Coefficient
The correlation coefficient is a statistical technique for determining the type and the degree of the relationship between a quantitative variable and another quantitative variable.The correlation coefficient is one of the criteria applied for determining the correlation of two variables.The correlation coefficient demonstrates the intensity and the type of relationship, which could direct or inverse.This coefficient is between 1 to -1, but in case there is no relationship between the two variables, it is 0.

The Genetic Algorithm
The genetic algorithms implement the natural selection principles of Darwin to detect the optimal formula in order to forecast or match the patterns.In other words, it is a programming tool, which implements the genetic evolution as a pattern.The problem that should be solved is the input for the algorithm and solutions will be encoded based on a pattern.In addition, the criterion called the propriety function evaluates each candid solution.In fact, the set of solutions for a problem will be selected randomly and form the primary population.In each level, some of these solutions will be chosen based on the propriety function and will produce the next generation.If this algorithm is designed properly, it will converge towards the optimal solution.One solution for the given problem is demonstrated via a list of parameters called chromosome or genome.Chromosomes are mainly shows as a single string of data.Of course, other types of data structures can also be applied.At first, different indices are generated for creating the first generation.Throughout each generation, each single specification is evaluated through the propriety function.To create the next generation, two genetic operators including chromosomes linkage and mutation are implemented.In the chromosomes linkage operator, two of the solutions from the population are chosen based on the propriety function and they are mixed together in order to generate new solutions or the offspring.In the operator, the offspring gene mutation are changed randomly, which leads the genetic algorithm not to get stuck in the local optimal (Baradwaj & Pal, 2011).The multipurpose genetic algorithm is similar to the genetic algorithm except that the objective is to optimize different purposes, simultaneously.One alternative for these multipurpose optimization solutions is to change them to a problem with one purpose.To so this, the purpose of the problem equals a combination of various primary solutions each of which is assigned a weight (Dias & De Vasconcelos, 2002).

The extended RFM model
One of the models used for detecting the rules dominant on the customers' relationships is the RFM models.The RFM models evaluate three properties for customers and score them based on the following properties, 1. Recency: the amount of time elapsed from the last transaction.Lower values indicate higher likelihood of customers' repurchase.2. Frequency: The number of transactions conducted in a specified time span.Higher values indicate higher possibility of customer loyalty.3. Monetary: The more the price of this transaction, the more the organization's attention to it.
According to Bult and Wansbeek (1995), the customer score in this model is calculated as follows, where is the frequency property value and finally j M C is the value of the transaction price property for the customer j.In subsequent studies, the extended version of RFM is proposed.This model is called the weighed RFM model.In this approach, each RFM property is given a coefficient proportionate to its importance (Stone & Jacobs, 1988).In this approach, the following equation is used for calculating each customer's score: In this equation, w M , w F and w R represent the transaction price, the frequency property weight, and the recency property weight, respectively.There are also other ways proposed by considering other parameters such as RFM and WRFM.

Recent works
Alvandi et al. ( 2012) introduced another approach that utilizes the WRFM method together with clustering.In the presented approach, besides the three properties, namely recency, frequency and transaction price, the relationship time span is considered.The purpose of the relationship time span is the time interval between the first and the last transactions.This approach first clusters the customers and then it finds the error rate based on the distance each record.After clustering, the loyalty amount is calculated for each customer using the WRFM method and then the customers are classified into 16 groups.These groups specify the customers' profitability rate.Danaee et al. (2013) considered an approach for classification and prioritization of the customers of a company using the concept of CLV and measuring the customers' value and according to the importance of their needs.In this approach, the WRFM model and the K-means clustering algorithm are used, which means that first the R, F and M values are specified for each customer and then their weights (relative importance) are determined using analytical hierarch process (AHP) method (Saaty, 1988) and each customer's value is calculated based on the WRFM model.In the next level, the Kmeans clustering algorithm places the customers into 8 clusters according to their scores.Then the customers' clusters are ranked based on their CLV.Finally, the obtained results along with the strategies that are to be applied by the company in various clusters are studied.Khajvand and Tarokh (2011) presented a model where the customers' loyalty rate is calculated after studying their history in different periods and their behaviors in the future is estimated.This framework consists of 7 phases.First, it collects the required information in a six-season periods, then the collected data are divided based on the seasonal divisions and the RFM parameters are extracted for each customer and calculates clusters based on K-means and customer loyalty is calculated.Chan (2008) introduced a genetic algorithm (GA) for customers' segmentation.In order to apply the genetic algorithm, first the data were converted into a string format of binary numbers called the chromosomes.In each generation, GA generated a new population using operators such as linkage and mutation.Finally, only those individuals with higher fitness can survive.In this algorithm, first the input data are converted into the binary strings.Second, the chromosomes are generated, randomly.Third, each chromosome is evaluated and individuals with higher fitness will be selected as the parents for the next generation.Fifth, the crossover is applied to produce new chromosomes.The mutation takes place in one bit, the new generation is generated.

The proposed method
In this paper, a method is presented for evaluating the customers' loyalty rate.For the suggested method, first the properties and attributes which are more important in identifying the profitable and loyal customers in an organization are selected using the GA and then the customers are scored using the extended RFM model in terms of their loyalty rate.The rate at which each attribute is allocated will be calculated when specifying the customers' loyalty, using the Spearman's correlation coefficient.The suggested method can identify the loyal and disloyal customers with a good precision.Based on this, organizations will be able to take on various strategies based on the customers' properties in order to find and keep the profitable customers in the future.
In this paper, we have tried to select the attributes and properties, which are more important when specifying the customers' loyalty rate in each cluster using the multipurpose GA after clustering the customers.Then we determined the customers' loyalty rate based on the extended RFM model.The implementation phases of the suggested method are as follows: 1. Cluster the customers using the K-means clustering algorithm, 2. Calculate the Spearman's correlation coefficient for all properties, 3. Select the properties that are effective in determining the customers' loyalty rate using the multipurpose genetic algorithm, 4. Calculate the customers' score based on the extended RFM method.
The Spearman correlation ratio is calculated according to Balaji and Srivatsa (2012) as follows, For each attribute (property), x, from the customers' information set in the training set, there is a set of values.The attribute y is an attribute, which specifies whether the customer is loyal (1) or not (0).This way the correlation of each attribute is obtained with the customer's loyalty rate in each cluster.
The chromosomes are considered equal to the set of attributes, i.e. each chromosome consists of 16 genes (attributes).If the attribute exists in the set of solutions, its corresponding value will be 1 and zero, otherwise.After creating the primary population, new generations will constantly be produced after the mutation and linkage operators are applied and will further be replaced by the previous generation.This process will continue until the solutions are converged towards the optimal solution.The purposes of the multipurpose genetic algorithm in the suggested method are: a) minimizing the number of attributes in the set and b) maximizing the set ability of prediction.Therefore, if we consider two purposes as one, we will have: To determine the set prediction power, the Spearman's correlation coefficient is used.The prediction power equals the total of Spearman's correlation coefficients belonging to the attributes whose genes value equals 1.In order to calculate the customers' score based on the extended RFM method, after selecting the properties, which are effective in determining the customers' loyalty rate, the customers will be scored based on these properties.This score is indicative of the customer's life time.In this phase, each of the selected properties will be assigned a weight equal to the Spearman's correlation coefficient.The score of each customer is calculated as follows, where W i is the weight of the i-th attribute (Spearman's correlation coefficient) and C i is the value of the i-th attribute.Obviously, higher scores indicate higher customer's life time.In order to evaluate the presented approach, the Bank Marketing Data Set [http://kdd.ics.uci.edu] is employed, which includes three properties, namely recency, frequency and monetary.After clustering, the customers' records will be classified into four clusters.For each attribute, the Spearman's correlation coefficient is calculated per every four clusters.Then for each cluster, the attributes, which determine the customer's loyalty were selected.Based on the selected attributes in clusters, the scoring relations for customers were specified.In order to accurately assess the obtained relations, a set of records was employed as the examine set for each cluster.After clustering the records, each cluster is stored in a separate file.For each cluster, two third of the records were employed for the training set and the rest were used as the exam set.After the examining the suggested method with the records related to the exam set, the mean score pertinent to the loyal and disloyal customers are depicted for each cluster in Table 1 as follows, Now, in order to evaluate the obtained results, the two criteria, namely recall and precision were used as follows, As we can observe from Fig. 1 and Fig. 2, the suggested method in the customers' classification provides good precision.In the worst mode, it outperformed the best mode compared with the other two methods.

Conclusion
In this paper, a method was presented to determine the customers' loyalty rate.This method first classified the customers based on their specifications in some clusters using the K-means clustering.
Then it specified properties in each cluster, which help identifying the customers' loyalty rate based on an extended WRFM method.To specify this property set, the multipurpose genetic algorithm was used.The purposes of this algorithm were defined as reducing the number of properties and increasing the prediction power.In order to evaluate the suggested method, a valid benchmarks were employed.The results obtained from the implementation have indicated of the high precision of the suggested method in identifying the condition of customers' life time.
To conduct more studies in this field, one can utilize more precise clustering algorithms.In addition, since many attributes do not play an important role in determining the customers' loyalty, one can conduct a pre-process on the primary data and eliminate these attributes.After eliminating the attributes, applying the suggested method on the resulting database may provide better result.

Table 1
The mean score of customers in each cluster