EOQ estimation for imperfect quality items using association rule mining with clustering

Article history: Received February 9, 2015 Received in revised format: May 12, 2015 Accepted May 28, 2015 Available online May 3


Introduction
As information technology (IT) progresses rapidly, its capacity to store and manage data in databases is becoming important.Though IT development facilitates data processing and eases demands on storage media, extraction of available implicit information to aid decision making has become a new and challenging task.Data mining has thus been emerging as a powerful new technology to analyze and extract hidden potential information from huge volume of data.Data mining techniques, such as clustering, association rule mining, and classification, have attracted remarkable attention during the past few years (Zhao & Bhowmick, 2003).
Further, association rule mining is an important component of data mining.It helps in finding regularities/patterns in data.Extracting association rules is the core of data mining (AL-Zawaidah et al., 2011).It is mining for association rules in database of sales transactions between items which is important field of the research in dataset (Han & Kamber, 2000).The benefits of these rules are detecting unknown relationships, producing results which can perform basis for decision making and prediction.Apriori algorithm is a very popular algorithm for mining association rules.The goal of apriori algorithm is to find all rules satisfying some basic requirement such as the minimum support and the minimum confidence.Moreover, clustering is a technique for grouping objects on the basis of similarity.Broder et al. (1997) defined clusters as maximal connected components of some pair-wise similarity of transactions, thus suffers from the breakdown of the transitivity of pair-wise similarity.Guha et al. (2000) proposed the common neighbors of two transactions as a measure of pair-wise similarity.Wang et al. (1999) did not use any notion of pair-wise similarity.They clustered transactions that contain similar items.The difference is that clustering emphasizes the dissimilarity of clusters.The rationale behind clustering transactions prior to mining association rules is that the latter is performed on partitions that are essentially distinct from each other.Both association rule mining and clustering techniques helps in effective inventory management.
Inventory management is a system used to oversee the flow of products and services in and out of an organization.It usually involves monitoring the transfer of units in a company to prevent the inventory from increasing too high, or to dwindling to levels that could put the operation into jeopardy.In other words, effective inventory management is to keep a product in its right place at the right time and in the right quantity.However, sometimes there are defective items that should also be considered for more realistic situation.Defective items can arise due to man-handling or due to technical fault.As a result, to properly ascertain the role of defective items in inventory management, researchers have devoted a great amount of efforts in developing EOQ models.Porteus (1986) incorporated the effect of imperfect quality items into the basic economic order quantity model.He assumed that there was some probability that the process would go out of control while producing one unit of the product.Rosenblatt and Lee (1986) proposed an EOQ model for a production system which contains defective production and concluded that the presence of defective products motivates smaller lot sizes.Later, Lee and Rosenblatt (1987) considered using process inspection during the production run so that the shift to out-of-control state can be detected and restoration made earlier.Salameh and Jaber (2000) developed an economic order quantity model where each order contains a random fraction of imperfect quality items with a known probability distribution.Papachristos and Konstantaras (2006) examined the work accomplished by Salameh and Jaber (2000) and rectified the proposed conditions to ensure that shortages would not occur.Maddah and Jaber (2008) corrected Salameh and Jaber (2000) work associated with the method of evaluating the expected profit per unit time.Jaggi et al. (2011Jaggi et al. ( , 2012Jaggi et al. ( , 2013) formulated an inventory model for deteriorating items.They assumed that the screening rate is more than the demand rate.This assumption helps one fulfill the demand, out of the products which are found to be of perfect quality, along with the screening process.Jaggi andMittal (2011, 2012) developed an inventory model with joint effect of inspection, deterioration, time-dependent demand, inflation and time value of money.However, for some inventory items, the criteria (such as the price of an item) are derived not only from themselves, but also from their influence on the criteria of other items, usually called the "cross-selling effect" (Anand et al., 1997).Thus, items should be classified while considering such relationships.Kaku (2004Kaku ( , 2008) ) extended economic order quantity model for perfect items considering cross-selling effect.Mittal et al. (2014) extended economic order quantity model for imperfect quality items considering time expressions into association rules.
However, none of the researcher has explored so far the joint effect of cross-selling effect, association rule mining and clustering.Motivated with this aspect, this paper focuses on EOQ estimation using apriori algorithm with clustering and without clustering the transactions.The study shows that the combined use of association rules and clustering methods was more relevant.This approach brings an important increase in the number of rules produced, which eventually results in higher expected profit to the retailer's side.The results have been validated with the help of a numerical example.
The remaining parts of this paper are organized as follows.Notations used in this paper are defined in Section 2. EOQ estimation considering imperfect quality items and cross-selling effect is proposed in Section 3.An example to illustrate the proposed approach is given in Section 4. Observations are stated in Section 5. Conclusions are finally given in Section 6.

Notations
The present worth of expected profit, TID inventory transaction set.

Proposed work
This paper proposes a comparison between order quantity for imperfect quality items in frequent itemsets, considering cross-selling effect, with and without clustering the transactions.
Cross-selling is a technique of suggesting related products or services to a customer.The profit of a product does not only come from itself but also from other products that influence its sale.Hence, there are more chances of losing sale if cross-selling among items is more.The cross-selling effect among items can be determined by using association rules.Association rule mining finds interesting associations and/or correlation among large set of data items.Let I = {i1, i2, i3, i4,…..im} be a set of items.Now, support of item i1 is defined as the frequency of its occurrences in total transactions and confidence is defined as conditional probability of purchasing i2 when i1 is purchased and is given by formula: Apriori algorithm is used to generate association rules whose support and confidence is greater than user-defined minimum support and minimum confidence (Agrawal & Srikant, 1994).The frequent item-set is determined on the basis of minimum support and association rules are generated on the basis of minimum confidence.The flowchart of apriori algorithm is depicted in Fig. 1.

Fig. 1. Flowchart of apriori algorithm
Further, clustering refers to the partitioning of a collection of transactions into clusters such that similar transactions are in the same cluster and dissimilar transactions are in different clusters.Here, the term "large items" refers to the items contained in some minimum fraction of transactions in a cluster.Large items are used as a similarity measure of a cluster of transactions.The support of an item in cluster Ci is the number of transactions in Ci.Thus, for a user-specified minimum support s, an item is large in cluster Ci if its support is at least equal to s × Ci, otherwise item is small.Thus, large items contribute to similarity in a cluster while small items contribute to dissimilarity.The cost Ç to be minimized consists of two components: the intra-cluster cost and the inter-cluster cost.The intra-cluster cost is measured by the total number of small items and the inter-cluster cost measures the duplication of large items in different clusters.This clustering algorithm aims to minimize the cost due to large items and small items (Wang et al., 1999).The overview of the clustering algorithm is described in Fig. 2.  Further, the opportunity lost can be calculated as the sum of the possibilities that related items lose their sales, when one of them is out of stock.Define Gr,i is the possibility that when item r is out of stock and it influences on other item i, Gr,i can be written as follows: where, r = 1,2, … ., n are the items in a frequent item-set.n is the number of items in a frequent itemset.I(r, i) is the subset of item i except r item in a frequent item-set.In the case of i = r in the formula, I(i, i) = i and confidence (i→i) = 1.We can define the opportunity cost (OCr) by the formula as: where, c = cost of unit item i.
Index, Zr is used in both deterministic and probabilistic classical inventory policy which is given below (Kaku, 2004): where, Hr = holding cost of item r per unit.
Further, Zr is used to implement the concept of opportunity cost.Now, we have considered the model for EOQ described by Maddah and Jaber (2008).In this model, items are delivered with lot size y, with a purchasing price of w per unit, and an ordering cost of K. Here, the assumption is that each lot contains percentage defectives, α with a known probability density function, f(α).The selling price of goodquality item is p per unit.A 100% percent screening process of the lot is conducted at a rate of λ units per unit time.Further, defective items are sold after inspection at discounted price, cs per unit.Now, after inspection, αy represents the number of defective items.Then, numbers of perfect items are (1-α) y.To avoid shortages, it is assumed that number of perfect quality items is at least equal to or greater than the demand during screening time t, that is where, D is the demand per year.Now, replacing screening time t by y/λ in equation ( 6), the value of α is restricted to Revenue (Rev) earned by selling all the items is sum of selling perfect quality items and imperfect quality items.
Total cost (TC) is the sum of cost of all items, ordering cost, screening cost, and holding cost of all items.
TC= cost of items + ordering cost + screening cost + holding cost.
Total cost per cycle of length T is given as: where, β is the screening cost per unit and h is holding cost per unit per unit time.The total profit P(y) is given as total revenue (Rev) -total cost (TC), The expected profit per unit time is given by dividing profit per cycle by cycle length T, i.e.

𝐸𝐸[𝑃𝑃(𝑦𝑦
Since α is a random variable with known probability density function, f(α), then expected profit per unit time , E[PU(y)] is given as: Now, to maximize the total expected profit with respect to y, we will determine first derivative and second derivative of Eq. ( 12) which are given below: Second order derivative is negative for all values, which implies that there exists a unique value of y i.e. y * that maximizes the profit and it is given as: The value of y * gives the optimal order quantity for item-set.Further, the order quantity is modified to get the optimum order quantity considering cross-selling effects by Mittal et al. (2014).Eq. ( 13) multiplied with square root of Eq. ( 5) to get modified order quantity for imperfect quality frequent itemset with cross-selling effects, which is given as: Finally, the proposed work can be summarized in Fig. 3.

Fig. 3. Proposed research work
Fig. 3 explains the ordering policy for two distinct cases:  Apriori algorithm is applied on inventory transaction database to calculate opportunity cost of frequent items.Further, opportunity cost is used to estimate the EOQ and revenue of frequent items. Clustering algorithm is applied on inventory transaction database to obtain homogeneous clusters.Further, apriori algorithm is applied on each cluster to calculate opportunity cost of frequent items.This opportunity cost is used to estimate the EOQ and revenue of frequent items in each cluster.

Numerical Example
A numerical example is solved to calculate EOQ by using rules obtained by apriori algorithm without clustering the data.First, opportunity cost is calculated and then it is used to determine EOQ for imperfect quality items.
Consider the database set D and the inventory item-set, I = {a, b, c, d, e, f, g, h, i}.The inventory transaction set, TID = {TID1, TID2, TID3, TID4, TID5, TID6} as shown in Table 1.Each row in Table 1 can be taken as an inventory transaction.The association rule can be identified from these inventory transactions using the apriori algorithm.Table 2, shows the inventory policy that will be used to calculate the opportunity cost of various items.Further, apriori algorithm is applied on database of Table 1.The frequent item-set found by applying apriori algorithm is {a, b}.Hence, it should be treated as a special item-set in the ranked list of items.Now, we calculate confidence and opportunity cost by using equation ( 2) and (4) as: Confidence (a → b) = 100% and Confidence (b → a) = 100%.
Opportunity cost of item a (Oa) = Ca.confidence(a→a)+ Cb.confidence(a→b) = 30 × 1 + 20.50 × 1 ≈ 50.50 After substituting the values of opportunity cost of item 'a' in equation ( 5), we get Z for item 'a' as: Similarly, after applying rules and conditions described above, the opportunity cost of item 'b' is 50.50 and Zb ≈ 1.1980 Consider the following parameters for item 'a': Purchase cost, w = $30/unit, Selling price of good quality items, p = $60/unit, Selling price of imperfect quality items, cs = $25/unit.
It is assumed that the inventory operation operates on a 6 hours/day, for 365 days a year, then the annual screening rate, λ = 1×60×6×365 = 1, 31,400 units/year.
Further, it is also assumed that the percentage effective random variable,, is uniformly distributed with its probability density function as: Note: To avoid shortages condition, Eq. ( 7) must be satisfied.
Table 3, shows the opportunity cost, EOQ and Revenue earned of various items.Now, we calculate EOQ by using rules obtained by apriori algorithm after clustering the data.First, the transaction database is clustered and then, opportunity cost is calculated, which is used to determine EOQ for imperfect quality items.
Opportunity cost of item a (Oa) = Ca.confidence(a→a)+ Cb.Finally, the results are summarized in Fig. 4.  Apriori algorithm is applied on inventory transaction database to calculate opportunity cost of frequent items.Further, opportunity cost is used to estimate EOQ and revenue of frequent items.Since the selected number of rules is less, therefore it results in lesser profit for the retailer.
 Clustering algorithm is applied on inventory transaction database to obtain homogeneous clusters.Further, apriori algorithm is applied on each cluster to calculate opportunity cost of frequent items.This opportunity cost is used to estimate EOQ and revenue earned from frequent items.Since the selected number of rules is more in each cluster due to the crossselling effect among items, therefore it results in higher profit for the retailer.

Observations
The impact of defective parameter E[α] on the EOQ have been determined for two cases for item a.
Similarly, the graph for other frequent items can be drawn.Results are displayed in Fig. 5.  Case1: When items are defectives and without cross-selling effect.
 Case 2: When items are defectives and with cross-selling effect.On the basis of computation result, as shown in Figure 5, we obtain the following phenomenon:  Fig. 5 explains that as the percentage of defective items decreases, the optimal order quantity increases, which results in higher expected profit for the retailer. Further, if items are of defective quality and cross-selling effect is taken into consideration, the optimal order quantity tends to increase more as compared to case 1, which eventually results in higher expected profit.

Conclusion
This paper determined the EOQ of imperfect quality items for two cases.Case 1 calculated EOQ using apriori algorithm without clustering the transactions while case 2 calculated it by clustering the transactions.In the light of the facts presented in this paper, it has become clear that the number of association rules selected was more when transactions were clustered, which eventually resulted in higher expected profit.Additionally, the impact of defective parameter E[α] on the EOQ has been determined for both the cases, namely with defectives and without cross-selling effect, and with both defectives and cross-selling effect.This increase in percentage defective items alerts the retailer to look into the source of supply and take corrective measures in order to improve the quality of supply.A numerical example has been solved to validate the result.For future study, it is desirable to extend the proposed model for imperfect quality items by using combination of data mining techniques, such as association rule mining and classification, to obtain improved value of EOQ.

Table 3
Values for opportunity cost, EOQ, and revenue for items a and b

Table 4
Confidence of frequent item-set in Cluster C1 and Cluster C2

Table 5 ,
shows the opportunity cost, EOQ and Revenue earned of various items after clustering the transactions.

Table 5
Values for opportunity cost, EOQ, and revenue for frequent items of cluster C1 and C2