A new methodology to study customer electrocardiogram using RFM analysis and clustering

Article history: Received October 2


Introduction
During the past two decades, information technology has dramatically changed the relationship between the customers and businesses by increasing new opportunities for marketing (Ngai et al., 2009).In order to better understanding the customers' needs we could gather all sales' transactions to analyze using different scientific methods.A good analysis of data could help understand customers' behavior through transforming the data into some classified information.The classified data will help us extract the necessary knowledge, which could lead for better marketing planning (Bottcher et al., 2009).Customer behavioral analysis is useful for detecting prospective customers who are wealthy and we wish to retain them as much as possible.One of the primary questions to analyze the data is to choose an appropriate tool such as data mining to get the necessary results (Yuantao & Siqin, 2008).Data mining is one of the most popular tools for predicting and analyzing company's data to discover useful trends and patterns.Data mining is the result of applying sophisticated modeling techniques from the diverse fields of statistics, artificial intelligence, and database management (Yuantao & Siqin, 2008;Han & Kambert, 2001).There are many evidences to indicate that purchasing records are important for direct marketing.In fact, demographics and purchasing history of consumers play important role on profitability of direct marketing activities and the ability to target consumers, effectively (Kaefer et al., 2005;Kim & Street, 2004).
Customer segmentation tries to group customers with comparable desires and purchasing behaviors in accordance with consistency among members in the same group and diversity among different groups.Generally, customer's segmentation is associated with statistical segmentation, lifestyle segmentation, behavior segmentation and benefit segmentation.Through the customer segmentation, the function of behavior segmentation is to classify customers in terms of behavior pattern of the existed customers in database (Ying & Feng, 2008;Minghua, 2008;Xiang-Bin & Yi-Jun, 2006).Various groups which have been derived based on customer behavior analysis, will contribute to the effectiveness of corporate marketing activities, increase customer satisfaction and loyalty, and facilitate enterprises to make competitive advantages (Minghua, 2008).There are many works such as decision trees and clustering dedicated to study the behavior segmentation based on the purchase behavior.Customer behavior segmentation has been widely used in various industries and areas, including e-commerce, retail markets, tourism, marketing, etc.By segmenting the consumer markets according to their behavioral patterns, one can target the acquisition and retention of highly profitable and potentially lucrative customers.One of the popular methods for analyzing customer behavior, defining customer segmentation, and market segmentation is RFM analysis.Cheng and Chen used RFM method to value customers and clustered them by K-Means (Cheng & Chen, 2008).According to Hung et al. (Huang et al, 2009), customers with the same pattern of purchasing are only clustered and RFM is used to calculate the value of each cluster.Tsai and Chiu (2004) proposed a market segmentation methodology based on product specific variables such as items purchased and the associative monetary transactional history of customers and they used RFM to analyze the relative profitability of each customer's cluster.Yeh et al. (2009) extended the traditional RFM model by including two parameters, time since the first purchase and churn probability.The organization of this paper is as follows.Section 2 briefly introduces short time series distance, K-means clustering, and RFM analysis.Section 3 describes the proposed methodology, section 4 includes the numerical results for a real-world case study, and section 5 summarizes the contribution of the paper.-Levet et al. (2003) used slope to compute the distance between two short time series.The distance between two-time series i x and j v is the sum of the squared differences of the slopes in two time series which is as follows,

Möller
where k t is the time point for data point ik x and jk v and to remove the effect of scale, z standardization of the series is recommended.

K-Means Clustering
In data mining, k-means clustering is a method of clustering which aims to partition N observations into K clusters in which each observation belongs to the cluster with the nearest mean.Normal evaluation of a proper K is to minimize the inner-cluster variation and maximize the among-cluster variation.K-means clustering is sensitive to outliers, so, outliers must be deleted before completing clustering (Ying & Feng, 2008;Cheng & Chen, 2008;Farvaresh & Sepehri, 2010).

RFM Analysis
RFM analysis is the most powerful and the simplest model to extract the value of customers.RFM analysis models three dimensions of customer transactional data, recency, frequency and monetary, to classify customer's behavior.RFM analysis is also used to calculate a score for each customer (Cheng & Chen, 2008;Yeh et al., 2009) where customers with high scores are usually the most highly responsive to promotions, the most likely to repeat a behavior, the most profitable, and vice versa.
The details of the definitions of RFM analysis are described as follows (Cheng & Chen, 2008;Yeh et al., 2009;Lingras et al., 2005): • R represents recency, which is the interval between the time that the latest consuming behavior happens where shorter intervals have bigger values.• F stands for frequency, which refers to the number of transactions in a particular period, for example, two times per one year, two times per one quarter or two times per one month.• M also stands for monetary value, which means the amount of money is consumed in a particular period.

Research design
In this section, we present a customer behavior analysis methodology which is based on customer segmentation changes in the sequence of time intervals.The proposed model of this paper has three phases.In the first stage of the method, we transform the customer's behavioral trends from the past transactional data.The main works in this phase is preprocessing, setting time intervals, assigning customers via RFM to pre-determined segments, and extracting behavioral trends.Customer's clustering based on behavioral trends is performed in the second phase and the main behavioral trend is determined.This phase consists of two steps, clustering behavioral trends and identifying dominant trends.Consequently, in the third phase named clusters analysis, we analyze clusters through analyzing the clustering, labeling and profiling.

Customer's behavioral trends derivation
There are different ways of gathering the customers' information such as purchasing goods using credit cards, debit cards, etc.In fact, when a particular customer buys a product, his/her credit information along with other necessary information such as customer's demographic, customer's life styles and product characteristics are stored in a central database.Any study on the customer's intention to buy a specific product can be traced using the database and customer's behavior is derived based on monitoring the purchase trend over a predefined period of time and any supplementary data will also help us achieve more accurate results from customer's behavior analysis.Since the primary focus of this study is to find the behavioral trends of customers, we need to find out whether such trend exists or not.The following steps of the first phase, intend to extract the customers behavioral trends.

Preprocessing
One of the main activities in any data-mining project is to preprocess the data.The goal of this step is to prepare the data to be ready to analyze through studying the purchasing data.The preprocessing operation includes data integration and data transformation.

Setting time intervals
In the customer's behavior changes analysis we can divide the time into different intervals instead of considering the time as a single slope.The advantage of dividing time is positioning of every customer in each time intervals.Therefore, the changes of customer's position over the time are tractable.The problem in this part is to determine how the time intervals are defined and how many time intervals are needed.We have already explained the response for the first part of the question but the answer for the second question depends on different factors such as the type of industry, density of the historical purchases, and the duration of existence of data.The common ways of dividing the time is based on dividing by day, week, quarter, half year, or year.As we mentioned earlier, the analyzer decide to use one of the dividing factors based on the different conditions.

segmenting customers via RFM to pre-determined segments
In this step of the methodology, customers are segmented into a range of segments for each time interval by scoring RFM.Classification of customers into five clusters is executed in terms of average purchase payments (monetary) and purchase frequency.In this type of segmentation, the recency is hidden in the time frames.In each interval, we calculate the amount of M (monetary) and F (frequency).Since the variables F and M are highly correlated, we calculate two new variables (components) by means of principal component analysis (PCA) and it is expected that the new variable (component) describe most of the variance of the original space.Then the segmentation of the customers is executed based on the first principal component.Table 1 explains how the customers are assigned into five segments.

Table 1
Decision rules for segmentation of customers in each time interval In Table 1 FM is the new dimension that is the result of PCA on frequency and Monetary, µ and σ are mean and standard deviation of FM, respectively.

Indentifying customers' behavioral trends
Now, we have a sequence of customer's segments in each time interval.Changes of customers among the time intervals show the dynamic behavior of customers.Table 2 represents the sequence of three randomly selected customers, label of segment, among five time intervals.

Table 2
Customer segments in five sequential time interval

Clustering behavioral trends
This research exploits the K-means clustering method to cluster customer's extracted trends of behavior.The first outcome in this part of methodology is diagnosing outlier trends and removing them from the process of analysis.The second part is to achieve different clusters of trends.These clusters represent different groups of behaviors among interaction of customers with business.In fact, the foundation of this study is based on the extracted clusters.Therefore, the accuracy of clustering of customer trends has a great effect on the consequential results of the investigation.

Segment num. Distribution of customers
Most of data clustering techniques use cluster distance to analyze the data.In this research, however, we use cluster behavior instead of distance for our analysis.Therefore, we replace the Euclidean distance function with a fitted one.One of the popular options is to use short time series distance (STSD) function.In our study, since K-means clustering could not determine the best number of clusters, we replaced it with Dunn's index which is an indicator of clustering quality.The Dunn's index is defined as follows, ) , ( is the distance between two clusters i c and j c .diam(c) is the maximum diameter of cluster c.This index tries to find the best number of clusters with numerous iterations.In each iteration, a specific number of clusters will be used and a number of clusters with the highest amount of Dunn's index will be used in the k-means clustering.

Identifying dominant trends
In this study, we have realized that dominant trends are frequent fluctuation trends among many of customers.In order to recognize such trends, clusters need to have a significant number of customers to be regarded as dominant trends.In our study, the customers' behaviors are diversified and we are faced with a high number of small clusters.Therefore, we have merged some of small clusters into larger ones by means of Cosine similarity measure.We also determine a predetermined number for the final clustering number, which is resulted from the cluster visual exploration and consulting domain experts.As pointed out earlier, Cosine similarity can be used to find out resemblance of different located trends in a cluster.

Analyzing clusters
As we explained earlier, a comprehensive study on customer's fluctuations across RFM segments over time can reveal useful information on customer's behavior.The acquired dominant trend in the previous step is used in this section as a representative of accepted clusters to analyze.If there is an expansion trend on desired clusters it is a positive indication for marketing managers.They can further analyze the cause of changes between the sequences of periods.Expansion of undesirable clusters, on the other hand, can be used to take preventive measures.The study of changing the nature of cluster memberships of individuals is useful for target marketing.

Labeling and profiling
In the last step of our methodology, we associated a label for each detected cluster and provided a profile of the clusters' members.Labeling of each cluster will be determined based on the shape of fluctuations (increase, decrease, and being stable during the periods).Furthermore, specifying profile of clusters is another aspect of this step.In our study, profile is behavior-based, involving what the customer is actually doing such as whether they have multiple visits, multiple purchases and the amount of money they spend on their regular purchases.These are the questions, which would be focused by looking at the behavior.The proposed methodology tries to find the patterns of fluctuations of customer purchasing behavior over the time.These patterns are the foundation of our targeted customer behavior analysis.In this research, the term of electrocardiogram is used for customer behavioral trends.Identifying dominant fluctuations patters of customer behavior can be considered as a good basis for tuning the company relationships with customer.It is evident that there are some drivers/factors or reasons behind these fluctuations.Studying the roots and reasons for these fluctuations are out of the scope of the current study.For instance, there may be business, customer or even macroeconomic related drivers.As an important point regarding the fluctuations drivers, note that such analysis reveals some hidden factors having significant effects the customer churn or loyalty.Furthermore, identifying such drivers can be beneficial to improve customer relationship and promote customers.For clustering customers' behaviors in this research, we used K-means with short time series distance function.In K-means clustering, the user has to specify the number of clusters.We used Dunn's index for defining the number of clusters in our study.For starting the clustering we selected the preliminary centers of clusters, randomly.Since the result of clustering depends on the preliminary center, we repeated the procedure ten times.We identified the best clustering number from all ten clusters by comparing their Dunn's index.Table 4 shows the amount of Dunn's index for the results related to the clustering with highest Dunn's index where the highest Dunn index was 0.5423 which was associated with clusters number eleven.There are three events when we perform an analysis on clusters and extract prominent trends' behaviors.These events include when customers' behaviors remain plateau, when customers' behaviors fall down, and when customers' behaviors rise across the time from monetry and frequency point of view.Companies could make decision and define suitable strategies based on customer's behaviour.In clusters where customers follow a constant behavior in a period of time, a suitable promotions could be used to improve their behaviors.In clusters with continous declining frequency and monetary, we may come to a conclusion that these customers are going to churn and switch to competitors which is considered to be a red alarm.Clusters with improve in their behaviors are good candidates to make them loyal customers.With concerning prominent behavioral trends, in the clusters 8 and 9 a constant behavior is obvious after time frams of 4 and 5.They are good candidates to consider for possible promotion strategies to improve and to grow their value.In the behavioral trends in clusters 10 and 11, it is obvious that customers follow a declining trend.The average results for frequencies and monetaries of customers in the first year instead of the second year show this event.There is a steady increase in customer value for customers located in clusters 5 and 6 in time frame 3, 4 and 5.This trend declines shortly later and will rose again during the last time frame.The company should not only prevent the declining trend for this group of customers but also we must guide them to become loyal customers.There is a fluctuation for customers in clusters 4 and 7 during the time horizon and a special promotion for stabilizing these customers is necessary.There is also an intensive growth for customers in cluster 2 but it will derease at the end.Therefore, we need to perform more analysis to find out the reasones of such deacrese to prevent the same events in the future.Table 6 presents the mean of the new dimention of the customers in each time frams according to the clusters.In addition, in Table 7, the amounts of monetary and frequncy for all customers are shown seperatedly in different time frames.

Conclusion
This paper has presented a new methodology to measure customer's behavioral trends called customer electrocardiogram.The proposed model of this paper used clustering method with RFM analysis to study customer's fluctuations over time.The method was also applied for a real-world case study of food industry, and the results have been discussed in details.The preliminary results of the implementation of our proposed method for a real-world case study indicates that the method could analyze customer's behavior based on pending potential and frequency of purchasing behavior.This could help detect dominant fluctuation trends of customers' behavior over time.It seems that the method can provide more findings about customer behavior fluctuations.Mining transactional data to detect or nominate some drivers to fluctuations are very demanding in customer relationship management and we leave it as further research.Future research can include dynamic clustering in segmenting customers for each period.It can predict near future fluctuations and give solutions to prevent customer churn.

Fig. 3 .
Fig. 3. Electrocardiogram of customers in eleven clustersFig3shows the electrocardiogram of customers in eleven clusters.There are three events when we perform an analysis on clusters and extract prominent trends' behaviors.These events include when customers' behaviors remain plateau, when customers' behaviors fall down, and when customers' behaviors rise across the time from monetry and frequency point of view.Companies could make decision and define suitable strategies based on customer's behaviour.In clusters where customers follow a constant behavior in a period of time, a suitable promotions could be used to improve their behaviors.In clusters with continous declining frequency and monetary, we may come to a conclusion that these customers are going to churn and switch to competitors which is considered to be a red alarm.Clusters with improve in their behaviors are good candidates to make them loyal customers.With concerning prominent behavioral trends, in the clusters 8 and 9 a constant behavior is obvious after time frams of 4 and 5.They are good candidates to consider for possible promotion strategies to improve and to grow their value.In the behavioral trends in clusters 10 and 11, it is obvious that customers follow a declining trend.The average results for frequencies and monetaries of customers in the first year instead of the second year show this event.There is a steady increase in customer value for customers located in clusters 5 and 6 in time frame 3, 4 and 5.This trend declines shortly later and will rose again during the last time frame.The company should not only prevent the declining trend for this group of customers but also we must guide them to become loyal customers.

Table 3
Distribution of customers in the segments along the time frames Table 5 also shows the distribution of customers in eleven clusters.

Table 6
Mean of the customers new dimention in each time frams according to clusters

Table 7
The mean of Monetary anf Frequency for eleven clusters in eight different time frames(TF)