Applying decision tree models to SMEs : A statistics-based model for customer relationship management

Article history: Received February 5, 2016 Received in revised format April 15, 2016 Accepted May 7, 2016 Available online May 9, 2016 Customer Relationship Management (CRM) has been an important part of enterprise decisionmaking and management. In this regard, Decision Tree (DT) models are the most common tools for investigating CRM and providing an appropriate support for the implementation of CRM systems. Yet, this method does not yield any estimate of the degree of separation of different subgroups involved in analysis. In this research, we compute three decision-making models in SMEs, analyzing different decision tree methods (C&RT, C4.5 and ID3). The methods are then used to compute ME and VoE for the models and they were then used to calculate the Mean Errors (ME) and Variance of Errors (VoE) estimates to investigate the predictive power of these methods. These decision tree methods were used to analyze smalland medium-sized enterprises (SME’s) datasets. The paper proposes a powerful technical support for better directing market tends and mining in CRM. According to the findings, C&RT shows a better degree of separation. As a result, we recommend using decision tree methods together with ME and VoE to determine CRM factors. © 2016 Growing Science Ltd. All rights reserved.


Introduction
In recent decades, Customer Relationship Management (CRM) has specifically reflected the crucial role of the customer as a factor that helps a company's profitability and operation meet.The idea behind CRM is to learn the real requirements of the customer and to leverage this knowledge to increase the firm's profitability in the long term (Stringfellow et al., 2004).In the rapidly changing business culture, the economics of customer relationships is changing, and firms are facing the need to apply new solutions to address these changes (Ritter & Geersbro, 2011).In this context, the advent of information technology (IT) has transformed the way marketing is stated and how firms manage information about their customers (Stringfellow et al., 2004).
Customer Relationship Management (CRM) is an enterprise management strategy which concentrates on customers (Stringfellow et al., 2004).CRM applies modern IT to increase the capability of enterprises to maintain and recognize their customers through Business Process Reengineering (BPR), and to maximize the profitability.With the development of economy and competition among different organizations, there is an increase awareness of the fact that to acquire customers, only having good products and wide distribution networks are sufficient.In fact, enterprises could absorb faithful customers and be dominant in fierce market competitions only by considering customers' requirements, increasing the response speed, and providing customers with constant one-to-one services.
CRM can help enterprises better manage different activities concerning customers and render the business routines programmed and automated.In any small firm, to improve customer relationship process, a working group forms a committee to apply data mining algorithms to improve customer relationships, to increase process efficiency and to take necessary steps for organizational purposes.
Considering the increase in the complexity and accumulation of customer information, most organizations may succeed only by performing critical activities such as analyzing complex customer data, identifying customers' values, detecting the trend of customers' behaviors, appreciating the real value of customers, and analyzing customers from a lifecycle marketing perspective.However, all of these factors depend on data mining (DM), and the more the data are accumulated in a database, the more useful DM techniques are.On the other hand, DM is a process of gathering necessary information and knowledge hidden in a large reservoir of data which are incomplete, noisy, fuzzy, and random.DM is recognized by concepts, rules and patterns, etc.Compared with common database inquiry, the excellence of DM depends on how it could discover connections between businesses as well as the latent trends and patterns.For example, the problem of consumer attrition is sometimes out of control because of not having any early warnings, but DM can set up a model disclosing the rate of lost clients in advance.By using this model, enterprises could predict which clients they may lose in near future.Armed by this information, sellers, relying on dynamic marketing activities, could keep the clients who have the idea of leaving; even if clients are lost, the model assists sellers overcome their passivity that led to consumer attrition.
In general, through applying DM technology to analyze the data and determine the related knowledge and regulations, the whole CRM system may become a closed loop and work better.Customer data and information technology tools shape the foundation to build successful strategy (Ngai et al., 2009).With this trend, technologies such as data mining and data warehousing have changed CRM as a new area in which most organizations may gain competitive advantage.Data mining offers a sophisticated set of tools to excavate customer data in an analytical CRM framework.The data mining methods are indeed considered a leading CRM tool deeply influenced by IT.
There is no doubt, the upcoming advents in IT call for business organizations to shape the way CRM functions in their business culture.In the literature, many researchers have studied the potential advantages of data mining systems in managing customers (Ngai et al., 2009), but among them few have applied the principles of such strategic tools for small organizations.The interesting point here is that, although Small and Medium sized Enterprises (SMEs) have a strong focus on customers, suppliers, employees and other stakeholders, they are considered business enterprises which have not well-implemented the principals of e-business in their infrastructural marketing strategies.Data mining is one of those technology-driven tools of e-business that has gained less attention in smaller firms, compared to larger ones.
Considering prior research, one can argue that most of the studies did not propose a well-formulated technique to be used in SEMs.The present study applies three models (decision trees) and analyzes them to find the best model.Furthermore, the study tries to select the best method for SEMs in Fars province, Iran, by relying on a case study.The major contributions of the study is answering two critical questions: (a) how can use data mining algorithm for CRM in SMEs? and (b) which of the widely used decision tree methods is better to be used in SMEs?
The organization of this paper is as follows: section 2 is literature review.Section 3 describes the CRM in SMEs.Section 4 briefly introduces the decision trees methods (C&RT; C4.5; ID3).In Section 5, the statistics methods for validation is presented and the best method from C&RT; C4.5; ID3 is chosen.In section 6, we present how to use decision trees methods CRM with a case study in Small and Medium sized Enterprises in Fars Province.At the end, a concluding remark is given in section 7.

Literature review
Decision trees are produced by algorithms that identify various ways of splitting a data set into branchlike segments (Moon et al., 2012).These segments construct an inverted decision tree that originates with a root node at the top of the tree.The object of analysis is reflected in this root node as a simple, one-dimensional display in the decision tree interface (Nefeslioglu et al., 2010).The name of the field of data that is the object of analysis is usually displayed, along with the spread or distribution of the values that are contained in that field.A sample decision tree is illustrated in Fig 1, which shows that the decision tree can reflect both a continuous and categorical object of analysis.The display of this node reflects all the data set records, fields, and field values that are found in the object of analysis (Wu et al., 2009;Ture et al., 2009).The discovery of the decision rule to form the branches or segments underneath the root node is based on a method that extracts the relationship between the object of analysis (that serves as the target field in the data) and one or more fields that serve as input fields to create the branches or segments.The values in the input field are used to estimate the likely value in the target field.The target field is also called an outcome, response, or dependent field or variable (Nefeslioglu et al., 2010;Sindhu et al., 2010).

Fig. 1. Illustration of the Decision Tree
Many decision-tree algorithms have been developed.The most famous algorithms are C&RT, C4.5, ID3 that are simple decision tree learning algorithms developed by Ross Quinlan (T'sou et al., 2000, True et al., 2009), whose choice of split attribute is based on information entropy.The basic idea of ID3 algorithm is to build the decision tree by employing a top-down, greedy search through the given sets to test each attribute at every tree node.C4.5 is an extension of ID3 developed by Prather et al. in 1997(Parther et al., 1997, Ture et al., 2009;Yıldız, 2011).It improves computing efficiency, deals with continuous values, handles attributes with missing values, avoids over-fitting, and performs other functions.To deal with continuous data, CART (classification and regression tree) algorithms have been proposed.CART is a data-exploration and prediction algorithm similar to C4.5, which is a tree construction algorithm developed by Martınez-Munoz and Suárez (2004).Breiman et al. (1984) summarized the classification and regression trees.Instead of information entropy, it introduces measures of node impurity.It relies on a variety of different problems, such as the detection of chlorine from the data contained in a mass spectrum, as proposed by Berson and Smith in 1997.CHAID (Chisquare automatic interaction detector) is similar to CART, but their difference lies in that it chooses a split node.It depends on a Chi-square test used in contingency tables to determine which categorical predictor is farthest from independence with the prediction values that proposed with Bittencourt and Clarke (2003).It also has an extended version, Exhausted-CHAID.

CRM in SMEs
Numerous researches have been carried out in the context of data mining in larger firms (Treacy & Wiersema, 1997), but the literature is underdeveloped in case of SMEs.The reason for this underdeveloped research trend is well justified: in many cases, smaller firms cannot afford the initial capital for the installation of a systematic data mining framework.In fact, data mining is used when there is a huge expanse of data with important implied information.Managing the information can help companies be successful in market competition.
Nowadays, in the competitive e-business environment, business organizations are switching from product-oriented business strategies to customer-oriented ones.Globalization, increasing competition, and advances in information and communication technology, have forced firms to concentrate on managing customer relationships to efficiently maximize revenues (Azad & Ahmadi, 2015).As a strategy to optimize the lifetime-value of the customers, CRM can help firms succeed in the world of e-business (Rygielski et al., 2002;Hsu, 2009), define CRM as a cross-functional procedure to attain a continuous exchange of ideas with customers, across all their contacts and access points, with a personalized treatment of the most valuable customers to increase customer retention and the effectiveness of marketing networks.
Both large and multinational enterprises and SMEs are rigorously seeking to deploy CRM to gain competitive advantage to build their long-term profitability.Particularly, customer retention is important to SMEs because of their limited resources (Özgener & İraz, 2006;Reddick, 2011).Skaates and Seppanen (2002) point out the major contribution of CRM to the competence development of small firms.Furthermore, SMEs are embracing CRM as a major element of business strategy, while technological applications allow for a precise segmentation, provide profiling and targeting of customers, and bring about a customer-centric culture due to competitive pressures (Day et al., 2002, Sulaiman, 2011;Skaates & Seppänen, 2002;Gurău et al., 2003).

Decision Tree
During the late 1970s and early 1980s, Quinlan, a researcher in machine learning, developed a decision tree algorithm known as ID3 (Iterative Dichotomiser).This work was expanded on earlier findings about concept learning systems described by others and Quinlan later presented C4.5 (a successor of ID3), which became a benchmark to which newer supervised learning algorithms are often compared.In 1984, a group of statisticians (Chang et al., 2012) published the book Classification and Regression Trees (CART), which described the generation of binary decision trees.ID3 and CART were invented independently of one another at around the same time, although they followed a similar approach to learning decision trees from training tuples.These two cornerstone algorithms spawned a flurry of work on decision tree induction (Gurău et al., 2003).ID3, C4.5, and CART adopt a greedy (i.e., non-backtracking) approach in which decision trees are constructed in a top-down recursive divide-and-conquer manner.Most algorithms for decision tree induction also follow such a top-down approach, which starts with a training set of tuples and their associated class labels.The training set is recursively partitioned into smaller subsets as the tree is being built.In the following sections, we briefly introduce the C&RT, C4.5, ID3 methods.

Classification and regression tree
Classification and regression tree (C&RT) is a recursive partitioning technique implemented for both regression and classification.C&RT partitions the data into two subsets so that the records within each subset are more homogeneous than those in the previous subset.It is a recursive process through which each of those two subsets is then split again, and the process is repeated until the homogeneity criterion is reached or until some other stopping criterion is met (as in case of all of the tree-growing methods).The objective is to generate subsets of the data which are as homogeneous as possible with respect to the target variable (Breiman et al., 1984).In this study, measure of Gini impurity was used for categorical target variables.Gini Impurity Measure: The Gini index at node t, g (t), is defined as: where i and j are categories of the target variable.The equation for the Gini index can also be written as: (2) Thus, when the cases in a node are evenly dispersed across the categories, the Gini index takes its maximum value of 1 -(1/k), where k is the number of categories for the target variable.When all cases in the node are found to be in the same category, the Gini index equals 0. If costs of misclassification are specified, the Gini index is computed via: where C (i-j) is the probability of misclassifying a category j case as category i.The Gini criterion function for split s at node t is defined as: where L p , the proportion of cases in t, is sent to the left child node, and R p is the proportion sent to the right child node.Split s is chosen to maximize the value of ) , ( t s


. This value is reported along with the progression in the tree (Breiman et al., 1984).

Commercial version 4.5
C4.5 is a kind of supervised learning classification algorithm which is used to build decision trees from the data (Treacy & Wiersema, 1997).Most empirical learning systems are given a set of pre-classified cases, each explained by a vector of attribute values and constructed from a mapping of attribute values to classes.C4.5 is such a system that learns decision tree classifiers.It applies a divide-and-conquer approach to growing decision trees (Azad & Ahmadi, 2015).The main difference between C4.5 and other similar decision tree building algorithms is in the test selection and evaluation process.
denotes the proportion of cases in D that belong to the i-th class.C4.5 selects the test that maximizes gain ratio value Quinlan (1996).Once the initial decision tree is constructed, a pruning procedure is initiated to decrease the overall tree size and decrease the estimated error rate of the tree (Bradley, 1998).

ID3 algorithm
In decision-tree analysis based approaches, the most influential analysis is represented by the ID3 algorithm which was proposed by Quinlan (1996).It uses the decreasing speed of entropy as the criterion of selecting the attribute to be studied.This method uses well-known sample categories to obtain the ordered testing attributes, until all of the known samples have been classified.In the process of forming a decision tree, a method of informatics is used which can provide the largest information increase at any time, i.e. the largest entropy decrease (Zhuge, 2011).
For N samples, they belong to classes ) ,..., 2 , 1 . There are i N samples in class i C ; every sample has K attributes, and every attribute has k J values.The process of forming decision involves: (1) Computing the initial entropies.
For the given training sample aggregate, all the sample classes are known.So the sample aggregate constitutes the initial entropies of the system.
(2) Selecting an attribute as the root node of the decision tree. For kj n samples of the branch, assuming that the number of samples belonging to class i C is ) (i n kj , we can use following formula to get the entropy of this branch.

   
 Calculating the entropy decrease due to attribute k A , i.e.
(13)  So, the attribute Ak0 is the root node of decision tree.
(3) Attribute 0 k A has 0 k J values and divides the sample into 0 k J subsets.For every subset, we use the above method in turn and select an attribute ' k A as the next lower grade node of the decision tree, in which the maximum entropy decrease is gained.
(4) Following step 3, we continuously construct the next lower grade nodes until all of subsets have only one class.Here, the entropies are zero and the decision tree has been accomplished.

Model Validation
Validating the model is a critical step in the process.It allows us to determine if we have successfully performed all the prior steps.If a model is not validated well, it can be due to data problems, poorly fitting variables, or problematic techniques.There are several methods for validating models.In this study, regarding recent studies of statistical methods for validation, we used two statistical methods: the first one is Mean Errors (ME) and the second one is Variance of Errors (VoE) (Montgomery & Runger, 2010).To make a statistical comparison of the two methods, mean errors (ME) and variance of errors (VoE) are adopted as defined by Eq. ( 14) and Eq. ( 15) respectively.
( 1 where n is the total number of datasets

Case Study
Considering the significance of decision tree and its application in unfolding relationships for managing communication with costumers, in the following sections three widely used decision tree algorithms, which were explained in the precious sections, are investigated in a case study.This study uses the algorithms to discover the internal relationships of data and to select the best method commonly used in SEMs.1).Next, through the above-mentioned decision tree methods, the customer relationship management of customers was evaluated.To do this, the decision tree (C&RT, C4.5 and ID3) methods were implemented in the C++ encoding language and after solving the proposed decision tree (C&RT, C4.5 and ID3) methods, the accuracy of the evaluation methods was calculated.Table 3 illustrates the values of statistics related to each of the attributes.In Table 3 we calculate the mean error for each of the four criteria for evaluating the algorithms employed in this study we have discussed.These criteria were used in both training and testing.These criteria indicate that the error rate for each of the algorithms used in the training set and the test is to what extent.Parameters for each algorithm, comparing the values specified in Table 6 is based on the method of least error.Regarding to output the C&RT is the maximum linear correlation.
According to validation method, from among C&RT, C4.5 and ID3, C&RT was found to be the best, because both ME and VoE showed the minimum value for this method.In this research, to justify the application of the appropriate method, we used a bar chart and scatter graphs.The result of each of these experiments is depicted in Fig. 3.  Finally, from among the approaches to decision tree algorithms, certainly C4.5, compared to the other two algorithms, had a better performance in development model of CRM in SMEs.Therefore, C4.5 algorithm is useful for predicting the behavior of customers.Otherwise, respectively CRAT and ID3 algorithms can more appropriately predict behavior of customers in the long-term.

Conclusion
CRM involves important measures and necessary steps for enterprises to keep their competitiveness in market economy.The development and application of CRM, as a result, are taken seriously by enterprises.Presently, there are a number of well-established products of CRM.DM technology is the rapidly developing key technology which, through development and perfection, can help accomplish the aims of CRM.DM can completely activate CRM and lay a good foundation For CRM.Further development in DM technology will produce more extensive applications for the future and market values of CRM.Recently, decision tree models for this purpose have incorporated novel ideas and interpretations into the solution of such problems.In this study, we found that C&RT performed better than C4.5 and ID3 techniques.As a result, we recommend the application of decision tree methods altogether, especially C&RT with statistical comparisons (Mean Errors and Variance of errors) to select the best known approach for Small and Medium sized Enterprises.
Then, the proposed models were meticulously compared by using a real data set in order to provide helpful information on the general tendency of data, assess the effect of specific variables on survival in data sets, and help Small and Medium sized Enterprises to select the best method for solving CRM problems.There are limited data on the sufficiency of classification efforts by applying just one approach.Based on our observations, we suggest that data should be better explored and processed through high performance modeling procedures.Future studies can be conducted on applying hybrid robust models for this purpose.

A
which causes the maximum entropy decrease via:

Fig. 3 .
Fig. 3. Validation of decision tree methods with ME and VoE comparison

.
Ti is a split of D derived from attribute at.It splits D into mutually exclusive subsets These subsets of cases are single-class collections of cases.If a test T is chosen, the decision tree for D consists of a node identifying the test T, and one branch for each possible subset i D .For each subset i D , a new test is then chosen for further splits.If (Montgomery & Runger, 2010) approach or statistical regression) for hth dataset respectively.It is noted that the smaller the ME and VoE values, the more accurate the model is.According to validation methods, a method is optimal if both mean errors and variance of errors have the minimum value On the other hand, if in special cases, both mean errors and variance of errors are not in minimum value for the same method, we consider variance of errors because computation of the mean errors is in the variance of errors.For more details regarding to ME and VoE a reader may see Montgomery and Runger's contribution(Montgomery & Runger, 2010)

Table 2
Statistical analysis of data setThis case study included 2000 individuals, as customers of SMEs in Fars Province, Iran (specific on food industrials).For this purpose, a questionnaire was designed which includes questions about product quality, price, services, and so on (a list of the questions are shown in Table

Table 4
Compared existing of decision tree methodsTable4compares the outputs of the methods under study.Considering Table4, one can observe that C&RT, compared to the two other methods, showed the minimum value of error in both of the statistical tests.Therefore, SMEs for costumer communication management is more appropriate than the two other methods.