Predicting Customer Turnover Using Recursive Neural Networks

Department of Electrical Engineering, Mahdishahr Branch, Islamic Azad University, Mahdishahr, Iran Faculty of Information Technology, Faculty of Computer, Payame Noor Assaluyeh, Bushehr, Iran Department of Electrical and Computer Engineering, Faculty of Molasadra Branch, Technical and Vocational University (TVU), Ramsar, Iran Department of Computer Engineering, Karaj Beranch, Islamic Azad University, Karaj, Iran Department of Computer Engineering, Ramsar Beranch, Islamic Azad University, Ramsar, Iran


Introduction
Nowadays, not all customers have the same importance for companies, and companies are looking to identify and analyze customer attributes, as well as segregation and clustering based on their value. Detecting, analyzing the characteristics, and clustering of customers based on the value they have for the company provide the context for the optimal allocation of limited resources, the use of appropriate marketing strategies, and, ultimately, profitability management along with customer relationship management. The value of the life cycle of a customer is a concept that can help companies in this regard, which is determined by using different models. Customer analysis is a process that uses customer behavior data to help key business decisions through market segmentation and forecasting analysis.
Customer analysis is a process that uses customer behavior data to help key business decisions through market segmentation and forecasting analysis. This information is used by companies for direct marketing, site selection, and customer relationship management. Customer analysis plays an important role in predicting customer behavior. Life-style models use different strategies for modeling customer behavior. One of the most prominent uses of variable, sequencing, and monetary is (RFM) variables [1][2][3]. These variables provide a relative understanding of customer behavior and try to answer these questions: (i) When did the customer buy the last time?
(ii) How often do they buy?
(iii) How much do they spend? RFM variables are good statistics for customer behavior modeling and the main argument in the industry because it can be easily implemented [1,4].
By blasting large data and accessing online and offline transaction data, modeling the true value of customer life and predicting customer behavior by using RFM factors leads to corporate earnings, market profits, and greater customer loyalty [1,5].
Life-style models use different strategies for modeling customer behavior.
Recent efforts have been made to predict customer behavior. To predict the restaurant's priority, an artificial neural network (ANN) model is proposed in [2].This model models the social media position control, customer historical priorities, the impact of the customer's social network, and the client's customer transfer characteristics. FM variables are used to find retailers' parts of the data from the transfer of electronic funds transfer at the point of sale (EFTPOS). Other researchers (9) extracted the customer buying behavior by finding relationships among products and taking advantage of customer motivation. Then, customer preferences for student attributes are identified through probabilistic models for matching products with customers. Artificial neural networks are also used to predict RFM of blood donors. In this strategy, a base version of artificial neural networks is used that considers time as a separate input variable [3,6].
Disconnection prediction is a process in which customers identify who are turning away. In order to execute a turnaround forecast, a declined customer must be defined [4].

Review of Literature
In a study, Moslehi et al. focused on using the LRFM model to segment customer behaviors based on their life cycle value. Based on the CRISP data mining method and using the group hierarchical process, according to diagnostic analysis, customers are divided into 16 groups and 5 main clusters (under the headings of loyal customers, potential loyal customers, new, lost customers, and high-consumption customers). In this study group, 13 customers (loyal customers) with the highest value in terms of the life of the company's purchase period were identified that the company should strive to maintain them [1].
Akhundzadeh Noghabi and his colleagues in a study identified customer behavioral groups and the characteristics of each of these groups in the telecommunications industry. For this purpose, they first grouped customers based on RFM variables and K-means method, then using association rules to identify customer behavior patterns in each group.
Based on the results of this study, seven different behavioral groups of customers were identified that these results can lead to a better view and understanding of customer behavior patterns and improve marketing strategies [5].
Baradaran and Biglari used the improved RFM model to segment customers in the manufacturing and distribution industries of popular goods. They improved the quality of segmentation by replacing the purchase sequence variable (C), which represents the customer's purchase sequence during a particular period and equal to the number of months of the year that the customer purchased during that period, with the purchase delay variable in the RFM model. The results showed that customer segmentation based on CFM is more accurate compared to the RFM model [5].
In a study, Khodabandeh Lou and Niknafs presented a new method for segmenting the customers of a grocery store based on their loyalty and defined appropriate strategies for each segment. In this study, the effect of several effective factors (including the number of purchased goods, number of returned goods, discount, and delay in distribution along with RFM variables) on increasing the quality of loyalty evaluation has been measured. Based on the results of this study, the researchers divided customers into five clusters in terms of loyalty (including loyal customers, potential loyal, new, lost, and turned away customers), and finally, appropriate strategies for managing each customer. The results showed that the developed RFM is very accurate in predicting customer loyalty [1].
Chang and Tsay in their research entitled "Combining SOM and K-means in cluster data mining" have proposed the LRFM model to mean increasing the duration of customer relationship, which after extracting the model data and clustering from a combination of two value matrices (combination of two FM indicators) and loyalty matrix (combination of two LR indicators) were used for analysis and classified customers into five types and sixteen categories. These researchers showed that adding this index improves the accuracy of identifying loyal customers [2].
Chang and Chen proposed a model based on a combination of RFM and K-means methods with hard set theory. Based on their model, they classified customer loyalty by determining the number of clusters (number of clusters 3, 5, and 7, respectively), then evaluated and described the characteristics of customers in each cluster, and evaluated and implemented CRM. [3].
By using the RFM model and K-means clustering method, Wu and Chang and Lu analyzed the value of customers of one of the industrial equipment manufacturing companies. After preparing the data, the customers were divided into six clusters based on RFM indices. Then, the characteristics of all customers in the form of clusters were analyzed using the evaluation of the customer's lifetime value. At the end of the research, appropriate suggestions have been made with different groups of customers [6].
In another study by Lee et al. using a two-stage clustering method to analyze customer characteristics in customer category management based on the LRFM model indices and using a two-stage clustering method (from the method of determining the optimal number of clusters and K-means 2 Wireless Communications and Mobile Computing method) analyzed customer characteristics to improve customer relationship management in the textile industry, the results of which created a better understanding in the company to determine marketing strategies. Also, in the industry surveyed in Taiwan, it was found that some customers have communication lengths that are longer and more loyal, but the volume of transactions and the frequency of those customers may not be high [4]. Wei and Lin and Weng focused on the application of the LRFM model in the well-being of dental clinic clients and finally identified loyal clients based on the LRFM model. According to the results, patients were divided into four groups (including loyal customers, active, new, and unknown), and appropriate strategies for dealing with each group of these customers were determined [7].
Salehinejad et al. [8] analyzed the function of the recurrent neural network with the ReLU activation function model (ReLU-RNN) along with LSTM-RNN and SRNN. Performance results show that recursive neural network models have competitive performance for the RFM recommender system. The ReLU activation function has almost better performance (approximately 80% accuracy) compared to LSTM and SRNN. ReLU-RNN performs better for latency (78%) and sequencing (82%). This model has a performance of 79% for the monetary value parameter, while the LSTM-RNN has a performance (80% in terms of the monetary value parameter); so, it has a better performance.
In another study by Salehinejad et al. [9], they systematically examined major recent developments in neural networks in the literature and introduced challenging problems in RNN training. RNN refers to an artificial neural network that has frequent connections between them. Frequent connections learn dependencies between consecutive data or input time series. The ability to learn sequential dependencies allows RNNs to gain popularity in applications such as speech recognition, speech synthesis, machine vision, and video description generation. One of the main challenges of training RNNs is learning long-term data dependencies. This is generally due to the large number of parameters that must be optimized during RNN training over long periods [7]. In this study, they discussed the RNN architecture and various training methods for it that can be used to solve problems related to RNN training.

LRFMP Model
The LRFM model is a method used for customer clustering in customer relationship management (CRM). In this model, customers are categorized based on four characteristics of customer relationship length, purchasing novelty, buying frequency, and purchasing value. Based on this model, all four features of the LRFM term are derived from (R), sequence (F), and polynomial (M) [8].
Classic LRFM models have mostly performed well in customer segmentation in many different industries [9][10][11]. This study contributes to prior literature by proposing a new RFM model, called LRFMP for the customer segmentation and providing useful insights about behaviors of predicting costumer turnover. Clustering by LRFMP can be used to perform a more comprehensive analysis of roaming clients. This algorithm provides meaningful and valid categories and predicts that there are patterns for data set separation.
According to Rinartz and Kumar [12], Chang and Tesa [13], and Lee et al. [10], the RFM model cannot provide customers with long-term relationships and customers with short-term relationships with the organization that identifies in their research, and they propose the idea of customer relationship length and explore its impact on customer loyalty and profitability. They say that increasing customer relationship length will improve customer loyalty. It defines the variable that represents the time interval between the first and last customer purchases in the observed interval. The RFM model offers customers who have recently created high financial value for the company and have short-term purchasing

Wireless Communications and Mobile Computing
patterns over the average purchase frequency among repeatbought customers as valued customers, while the factor of the length of communication with the company is ignored. The length of customer relationship with the organization reflects the length of time a customer has started communicating with the organization. The article states that the length of customer relationship with the organization has a positive relationship with the probability of its future relationship. The LRFM clustering model has often worked well in customer segmentation in various industries, but we have included in this article the customer visit period (P) to the LRFM core model to determine customer behavior and their degree to measure. The periodicity of visiting customers is defined by (P), retirement time with (R), length (L), frequency (F), and monetary characteristics (M) [14,15].
The length of the time interval per day is between the first and last customer visits. This shows customer loyalty, and the longer it is, the more loyal the customer is. The retirement time shows the waiting time, how it updates the customer engagement with the company, and repeats information about the buying trend [4,16]. The traditional LRFM model is usually calculated as the time interval (usually per day) between the date of the last visitor's visit and the last date of the observation period. The variable in our model varies as the average number of days between the recent hits of the customer and the last date of the observation period [17,18]. Thus, the experimental value in our model is calculated using the following equation: In relation (1), t m − i + 1, tm shows the last customer visit.
n is the number of recent visitors to the client. t m is the last visit from the customer between the dates and the end of the observation period, the extension, and the date of the visit to the customer near to the renewal.
Note that for n = 1, this newly predicted predictive variable is transformed into traditional experience, and therefore, the recency property covers our change.
Frequency refers to the total number of customer visits during the observation period. The higher the frequency becomes, the more customer loyalty. The monetary factor refers to the average amount of money the customer pays during visiting during the visit and represents the customer's share in the income of a company. A higher monetary value represents a larger share of the company [17,19].

Proposed Model
The proposed model will be based on prediction of time series data with a combination of clustering and neural network. In this model, first, the data in the preprocessing process are clustered using the LRFM base model as LRFMP, then the resulting data set is applied to the neural network based on the testing scenarios, and the results are reviewed and displayed. Figure 1 shows the proposed model.
We explain the proposed model in the following steps. The first step includes the following actions. The first step is to first clear the data and unnecessary details at this stage. Some customer records are discarded because they are not in accordance with the proposed method. In order to find the required data, the factors of length (L), latency (R), sequence (F), monetary (M), and periodic (P) are calculated from the total data. In our article, we have used the LRFMP solution, which has only L, R, F, M, and P metrics.
These factors are considered for all customers of the initial aggregates, and all of the customers, whose total counts 817,741, are applied and those that have a lower value than the minimum specified value of each criterion. The customer segment is considered and displayed as an output that the number of customers dropped at this stage. At this stage (LRFMP model), there are 32268, which will be the number of inputs of the neural network in the next step.
Next, in this model, we have applied data clustering based on the LRFMP model. The LRFMP model is derived from the LRFM base model, with the P attribute added, and using the purchase history, 9 continuous LRFMP features for each member will be calculated. Additionally, existing days from the registration date for each member are removed from the demographic information.  The new feature shown in the relationship (2) shows whether customers regularly visit the stores.
Next, we set a period as the standard deviation of customer visit time (relationship (3)) In Equations (2) and (3), IVT shows the time between visits by clients.
n represents the number of clients the values between the visit time.
IVT is the time elapsed between the two consecutive customer reviews.
Thus, it is defined that where i ≥ 1 and t i represent the date of visit i from the customer. This quarterly benchmark shows that customer watches occur at regular intervals. If a customer has a low amount of dispersion, this means that the client is relatively seasonal or purchasing and can be identified on a regular basis.
In the second stage, the data obtained from the first stage are applied to the neural network and are based on different state of the results. At this stage, the neural network is defined based on a linear progressive network, and the results are obtained based on the number of neurons in the hidden layer. The proposed model is based on different scenarios. At this stage, the LRFMP model is first implemented, and then the resulting data are applied to the neural network based on the following structure.  5 Wireless Communications and Mobile Computing are able to identify relationships between existing data and display those relationships. Neural network inputs are variables, and the outputs are states that we need to predict or control. Neural networks play two main functions, which are learning and calling. Learning refers to the weighting step of links in a neural network that enables them to generate an output vector in response to when the vector is stimulated by the input layer.
Calling is the step of accepting an input, excitability, and generating a response as output in order to generate a network structure. The system is made up of a large number of highly interconnected processing elements called neural networks that work together to solve a problem and transmit information through synapses (electromagnetic communications).

Simulation of the Proposed Method
The data used in this study was collected from the recsyswiki database. This database in the ta-feng section contains information about retail market data from an anonymous Belgian vendor, and we will selectively analyze this data in our studies.
Before starting the analysis, the collected data should be prepared by applying preprocessing methods. For this purpose, in the first stage, by examining the collected data, it is observed that some of them are not suitable as input in applying the proposed model. Therefore, it is necessary to use factors to limit the data contained in the data set. Some of the factors considered in this research are age, place of residence,    In Table 1, each record shows the products purchased by a customer each time they visit the store, which includes the fields of date and time of transaction, customer ID, age (in this study, we have 10 age categories), location, area zip code, the subsets defined for the product in question (which products in the store are categorized into different collections) and the unique identifier assigned to each collection, the amount that represents the number of subsets of the product purchased for a product by the customer is the amount of inventory that represents the remaining quantities of the product sold, and the selling price, which represents the price set as the consumer price to supply the product to the customer.
In the simulation performed in this research, a file called train.txt is considered as a training data set, which is a training set for the neural network. The training file contains all the data related to the transactions performed by the users in D, except for the last transaction performed by each user. user_tran is a file created from the train.txt data set that records each record in this file of additional information about users' next transactions.

Wireless Communications and Mobile Computing
To identify deviant customers, in this study, we will follow the steps to discover the pattern: (i) Collect real data related to customer characteristics and transactions made by them and store this data in the appropriate database (ii) Reduce the volume of data collected by applying preprocessing methods and selecting optimal features. As a result of these measures, the dimensions and number of data properties are reduced (iii) Apply different classifications or clustering algorithms to discover the pattern. Use of classification algorithms (to identify deviant customers) The neural network is implemented based on 4 models of 4 layers, 8 layers, 10 layers, and 20 separate layers, the LRFMP output set is applied separately to the neural network, and the results are presented. Table 2 shows an example of the output of LRFMP clustering calculations. As can be seen, for each client with a unique identifier, the length (L), endpoint (R), sequence (F), monetary (M), and periodic (P) characteristics are calculated and stored as a matrix.
The results of various models of the neural network are described below. Figure 2 displays the proposed model with 4 hidden layers used. Figure 3 and Table 3 have shown the performance of the neural network of model 1 as outputs of length (L), backward (R), sequence (F), monetary (M), and periodic (P) outputs separately for training, testing, and evaluation data.

Model
Second. In this model, 8 hidden layers are used, and Figure 4 shows the neural network model. Figure 5 shows the performance of the neural network model 2 as outputs of length (L), backward (R), sequence (F), monetary (M), and periodic (P) output separately for training, testing, and evaluation data.
According to the diagrams in Figure 5, Table 4 shows the outputs related to the performance of the model 2 neural network.

Model
There. This model uses 10 hidden layers, and Figure 6 illustrates this neural network model. Figure 7 Function of the neural network of model 3 as outputs of length (L), back (R), sequence (F), monetary (M), and periodic (P) outputs separately for training, testing, and evaluation data.
The graphs of Figure 7 and Table 5 show the output of the neural network function of model 3.

Model
Four. In this model, 20 hidden layers are used, and Figure 8 shows this model of the neural network. Figure 9 shows the performance of the model 4 neural network as outputs of length (L), endpoint (R), sequence (F), monetary (M), and periodic (P) output separately for training, testing, and evaluation data.
According to the diagrams in Figure 9 and Table 6, the output of the neural network model 4 can be deserved.

The Performance of the Proposed Method
To compare the accuracy of the proposed algorithm, we use the test set data. The purpose of calculating accuracy is to measure the quality of the results obtained by applying the proposed algorithm and method in comparison with the actual results. Equation (4) is used to calculate the accuracy of the algorithms used.
In relation (4), TP represents the number of samples that have been correctly identified as positive, in other words, the correct number of people who have been correctly identified as customers and who have been correctly identified as customers.
TN shows the number of samples that have been correctly identified as negative, in other words, the number of people who have not been identified as a reversing customer and in practice that have not been a reversed customer.
FP represents the number of samples that have been misdiagnosed as positive, in other words, the number of people who have not been turned away but have been identified as turning away customers.      FN represents the number of samples that have been misdiagnosed. In other words, the number of people who turn away is not recognized as turning away.
The following equations can also be used to evaluate the performance of the proposed algorithm and method.
Equation (6) is used to calculate accuracy plus sensitivity.
Kappa in Equation (7) shows the accuracy of meeting the desired expectations.
6.1. Integrated Model of the LRFMP Neural Network. To integrate the proposed model, the LRFMP dataset is applied to the neural network simultaneously. The purpose of this work is to calculate the final value of each customer and the amount of credibility of the model. Figure 10 shows the function of the neural network based on three types of training, testing, and evaluation data.
In Figure 11, the collision matrix resulting from the application of the model of the neural network with 20 hidden layers is shown.

Performance Comparison
To evaluate the proposed model, in addition to the described evaluation model, the average squares of the error obtained from the neural network are used. The average square error for each square sample is the error between the desired output and the actual output, and then the average is taken. The lower the average, the better and more acceptable the results. The neural network is implemented based on 4 models of 4 layers, 8 layers, 10 layers, and 20 separate layers, and the LRFMP output set is applied to the neural network separately and integrated, and the results are presented.
Regarding the functions of different models of the hidden layers of the neural network, Table 7 shows the comparison of the outputs of these models.
As shown in Table 7, the 20-layer model in L showed better performance than other models; in R, the 4-layer model had the best performance, as well as in the clustering of the F-model, and the 20-layer model showed better performance, compared to other models. In a separate clustering of M, the 10-layer model exhibits a better performance and finally provides better performance in the P-10 clustering. In the integrated model, considering that only the 20-layer model was used, as shown in the collision matrix, the neural network with 99.2% yields a correct diagnosis of the resulting clusters.

Conclusion
Diverting forecast is an important tool for companies to keep up with competition in the market. In retail, a dynamic definition of disconnection is needed to identify the exact customers. This study examines the composition of the neural network, a new strategy for prediction of the deviation. It was found that the neural network was able to identify the tendency among all members, which is in fact the key to the prediction of the divergence. These results suggest that the neural network designed for the CLV time series prediction is a good strategy for rotating prediction. The metaparameter optimization strategy is not so problematic, and further optimization may be valuable; although, it may also improve the performance of the neural network. However, it should be pointed out that although the proposed model is used to identify trends, it is also used to identify members who deviate from the particular tendency of a specified group. It was also found in this research that clustering by LRFMP can be used to conduct a more comprehensive analysis of customer rejection. This algorithm provides meaningful and valid categories and predicts that there are patterns for dividing the dataset. Due to the limited time frame, only one clustering solution was investigated. The evaluation results indicate that the LRFMP can be used to test the return prediction efficiencies without the need for a rule extraction algorithm. The result is a shorter implementation time and easier implementation. The results of this research will be based on different models that can be used in customer evaluation.
The proposed algorithm does not depend on the type of characteristics, and the convergence in the relationships is important; on the other hand, due to the fact that the population is constant, the calculation time is acceptable. By examining the results, a high accuracy of 90% indicates that the proposed algorithm is efficient in converting and clustering customers. Therefore, in future development, the results can be optimized and stagnated.
There are numerous potential strategies for how to use the neural network along with the value of customer life

Conflicts of Interest
The author(s) declare(s) that they have no conflicts of interest.