Loss profit estimation using association rule mining with clustering

Article history: Received September 28, 2014 Accepted 28 December 2014 Available online January 1


Introduction
Data mining is the process to discover previously unknown relationships among the data, especially when the data come from different databases.Businesses can use these new relationships to develop new advertising campaigns or make predictions about how well a product will sell.Data mining techniques, such as classification, association rule mining, sequential pattern mining, and clustering, have attracted attention of several researchers (Zhao & Bhowmick, 2003).Association rules have been broadly used in many applications domains for finding pattern in data.The pattern reveals combinations of events that occur at the same time.One of the best domain is business field, where discovering of pattern or association helps in effective decision making and marketing.The best algorithm for finding association rule is apriori algorithm (Agrawal & Srikant, 1994).Moreover, clustering is the process of organizing objects into groups whose members are similar in some way.Hence, the behavior of the objects is studied by looking at the number of clusters.Broder et al. (1997) defined clusters as maximal connected components of some pair-wise similarity of transactions, thus suffers from the breakdown of the transitivity of pair-wise similarity.Guha et al. (2000) proposed the common neighbors of two transactions as a measure of pair-wise similarity.Wang's et al. (1999) method does not use any notion of pair-wise similarity.They cluster transactions that contain similar items.The difference is that clustering emphasizes on the dissimilarity of clusters.Both the association rule mining and clustering techniques can be used for effective inventory management.
Further, inventory management is mainly about identifying the amount and the position of the goods that a firm has as inventory.Inventory management is imperative as it helps to defend the intended course of production against the chance of running out of important materials or goods.It also includes making essential connections among the replenishment lead time of goods, asset management, the carrying costs of inventory, future inventory price forecasting, physical inventory, available space for inventory, etc.By balancing these competing requirements, a company will discover its optimal inventory levels.For inventory management, many researchers have devoted a great amount of efforts in developing inventory models.Porteus (1986) incorporated the effect of imperfect quality items into the basic economic order quantity model.Rosenblatt and Lee (1986) assumed that the time between the beginning of the production run; i.e., the in-control state; until the process goes out of control is exponential and the defective items can be reworked instantaneously at a cost and concluded that the presence of defective products motivates smaller lot sizes.Later, Lee and Rosenblatt (1987) considered using process inspection during the production run so that the shift to out-of-control state can be detected and restoration made earlier.Salameh and Jaber (2000) developed an inventory model where each order contains a random fraction of imperfect quality items with a known probability distribution.Papachristos and Konstantaras (2006) examined the Salameh and Jaber's (2000) work closely and rectified the proposed conditions to ensure that shortages will not occur.Maddah and Jaber (2008) corrected Salameh and Jaber's (2000) work related to the method of evaluating the expected profit per unit time.Jaggi et al. (2011Jaggi et al. ( , 2012Jaggi et al. ( , 2013) formulated an inventory model for deteriorating items.Jaggi andMittal (2011, 2012) developed an inventory model with joint effect of inspection, deterioration, time-dependent demand, inflation and time value of money.Mittal et al. (2014) extended inventory model considering time expressions into association rules.The management of inventory can become more effective, if inventory is classified into categories based on some criteria like ABC classification, loss profit, and cross-selling effect.
Further, for some inventory items, the criteria (such as the price of an item) are derived not only from themselves, but also from their influence on the criteria of other items, usually called the "cross-selling effect" defined by Anand et al. (1997).Thus, items should be classified while considering such relationships.The ABC classification is used for ranking all inventory items on the notion of profit based on historical transactions.However, cross-selling effect is not considered while ranking items in traditional ABC classification.Brijs et al. (1999Brijs et al. ( , 2000) ) developed a PROFSET model by considering cross-selling effect among items.They calculated the profit of a frequent item-set.However, the PROFSET model does not consider the strength of relationship between items.The PROFSET model does not provide relative ranking of selected items, which is important in classification of inventories.Moreover to calculate the profit of a frequent item-set the maximal frequent item-set had been used.However, the maximal frequent item-set often does not occur as frequently as its sub-sets.Therefore, the PROFSET model cannot be used to classify inventory items.Kaku (2004) classified inventory items based on strength of relationship between items.Kaku and Xiao (2008), further extended inventory classification considering cross-selling effect and ABC classification.They conducted experiments to show that a considerable large part of inventory items change their positions in the ranking list of importance.However, they have not considered that whether and how the strength of relationship with correlated items influences such ranking approach.Xiao et al. ( 2011) classified inventory items which are correlated each other using the concept of cross-selling effect together with ABC classification and loss profit.They classified items based on loss rule (Wong et al. 2003(Wong et al. , 2005)).The loss profit of item/item-set is defined as the criterion for evaluating the importance of item, based on which inventory items are classified.They explained that to judge the importance of an item (set), it is not only by looking at the profit it brings in when it is on the shelf, but also the loss profit it may take away when it is absent or stock out.However, they have not classified items in particular clusters.
In this paper, transactional clustering algorithm is used to partition the transactional database into different clusters.Further, apriori algorithm is applied for mining association rules from each cluster to find frequent items.Then, the loss profit is calculated for each frequent item.The frequent items are ranked in decreasing order of loss profit in each cluster.This ranking assists inventory manager to recognize most profitable item in each cluster.Further, an example is illustrated to validate the results.

Proposed Work
This paper proposes to calculate lost profit of frequent items in each cluster, which are found by applying apriori along with clustering.
For some inventory items, evaluating the importance of one item comes not only from its own value, but also from its influence on the other items, i.e., the ''cross-selling effect'' (Anand et al., 1997).Thus, there are more chances of losing sale if cross-selling effect among items is larger.The cross-selling effect among items can be determined by using association rules.Association rule mining aims to find rules of the form: A → B, where A, B are two sets of items.The meaning of the rule is that if the lefthand side A occurs, then the right-hand side B is also very likely to occur.The interestingness of the rules is often measured using support and confidence.The support of a rule is defined as the number of records in the dataset that contain both A and B. The confidence of a rule is defined as the proportion of records containing B among those records containing A. Association rule mining outputs rules with support no less than min_support and confidence no less than min_conf, where min_sup is called the minimum support threshold and min_conf is called the minimum confidence threshold.The two thresholds are specified by users.
Let I = {i1, i2, i3, i4,….., im} be a set of items.Now, support of item i1 is defined as the frequency of its occurrences in total transactions and confidence is defined as conditional probability of purchasing i2 when i1 is purchased and is given by formula:

Support i Frequency of i Total number of Transactions
(1) This algorithm was proposed by Agrawal and Srikant (1994).The flowchart of Apriori algorithm is depicted in Fig. 1.

Fig. 1. Flow chart of apriori algorithm
Further, clustering is an important data mining technique that groups together similar transactions.Fast and accurate clustering of transactional data has many potential applications in retail industry, ecommerce intelligence, etc.Here, the term "large items" refers to the items contained in some minimum fraction of transactions in a cluster and is used as similarity measure of a cluster of transactions.The support of an item in cluster Ci is the number of transactions in Ci.Thus, for a minimum support s, an item is large in cluster Ci if its support is at least equal to s × Ci , otherwise item is small.Thus, large items measure similarity in a cluster while small items measure dissimilarity.Two components of cost Ç are to be minimized consists of: the intra-cluster cost and the inter-cluster cost.The intra-cluster cost consists of the total number of small items and the inter-cluster cost measures the duplication of large items in different clusters.This clustering algorithm helps to minimize large items and small items cost.
The overview of the clustering algorithm as described by Wang et al. (1999) is shown in Fig. 2. Further, Xiao et al. (2011) ranked items according to their loss profit.The importance of an item is evaluated by considering both the profit it brings plus the loss profit it may take away when it is absent or stock out.The algorithm as proposed by Xiao et al. ( 2011) can be explained in three steps: Step 1: Generate the cross-selling profit matrix according to formula: where MBA indicates the profit loss caused by the cross-selling relationship B→A, which can be read as: the cross-selling profit loss of item B from item A when item B is absent (or stock out).
Step 2: Calculate the loss profits of every item according to formula Step 3: Rank all items in terms of loss profit in descending order and do ABC classification.Hence, we will consider cluster Ç2, as it has minimum cost as compared to cluster Ç1 and Ç3.Hence, the transaction database of table 1 is clustered into two clusters consisting of C1 = {TID1, TID2, TID3, TID4} and C2 = {TID5, TID6}.Further, we apply apriori algorithm on both clusters.We find item-set {a, b, c} is the most frequent item-set in cluster C1 and item-set {d, g} is the most frequent item-set in cluster C2.Now, we calculate confidence of frequent item-set {p, q, r} of cluster C1 and {t, x} of cluster C2 by using equation ( 2), as shown in Table 3.
Thus, the loss profit of item p using equation ( 4) is $46.Similarly, after applying rules and conditions described above, we can determine the loss profit of frequent items in different clusters as shown in Table 4. Therefore, by ranking the items in descending order starting with the largest value of loss profit we can get a ranking list of (p r q) in cluster C1 and (t x) in cluster C2.Item p has been ranked further than item r as it has larger loss profit in cluster C1.According to ABC classification, profit of item p = $20, q = $12, r = $8, t = $4 and x = $2.Similarly, item t has been ranked further than item x as it has larger loss profit in cluster C2.Thus, we have applied clustering algorithm to find different clusters of transactions.
After that we have applied apriori algorithm on each cluster to find frequent items.Further, we have classified frequent items in each cluster according to loss-profit.Thus, by ranking frequent items in each cluster helps the manager to identify most profitable items in each cluster.

Conclusion and Future research
In this paper, clustering algorithm has been applied on transactional database to find different clusters of transactions.Further, apriori algorithm has been applied on each cluster to find frequent items.The frequent items in each cluster have been classified according to loss-profit.The loss profit of item was the total profit that the item may takes away when it is out of stock.A numerical example has been presented to illustrate the utility of the new approach.The inventories can be classified according to three cases: By using traditional ABC classification ranking of frequent items will be (p q r t x), According to loss profit ranking of frequent items ranking list will be (p r q t x).Further, for different clusters ranking of frequent items will be, cluster C1p r q, cluster C2-t x.
In case 1, inventory items are classified using ABC classification, but this case did not consider loss rule for classification.In case 2, inventory items are classified using loss rule, but this case did not consider different clusters for classification.In case 3, inventory items are classified in different clusters using loss rule.Results indicate that a considerable large part of inventory items change their positions when they are ranked according to loss-profit as compared to traditional ABC classification in each cluster.Some items that traditionally do not belong to the A group in each cluster have been moved into the group A by the cross-selling effect to reconfigure their inventory policies, and also some items that traditionally belong to C group in each cluster have been promoted into higher group because of their high values of loss profits and should not be ignored as these were treated before.This approach helps inventory manager to find most profitable items in each cluster, so that he earn profit and easily manage stocks.For future study, it is desirable to extend the proposed model by considering timevarying aspects of databases.Further, an approach based on data mining technique like temporal association rule mining can be proposed to obtain a new ranking list based on loss profit.

Fig. 2 .
Fig. 2. The overview of the clustering algorithm

Table 4
Loss-profit of frequent items in different clusters