Situation reactive approach to Vendor Managed Inventory problem
Introduction
Fierce competition in today’s global markets, the introduction of products with shorter life cycles, and the heightened expectations of customers have forced business enterprises to focus attention on and invest in their supply chain management (Simchi-Levi, Kaminsky, & Simchi-Levi, 2000). It is an effective means of the enterprises to improve their service levels for customers at minimum costs. One of key factors to improve their service levels is to efficiently manage the inventory level of each participant within supply chains.
Traditional “order-and-supply” based inventory control policy suffers from inordinate amount of surplus stocks at suppliers. In general, the suppliers do not know the order quantities of retailers in advance and have to maintain more safety stocks than they actually need for on-time replenishment. This causes the magnification of demand fluctuations as orders move up to upstream sites in a supply chain, which is called “bullwhip effect” (Lee, Padmanabhan, & Whang, 1997). To resolve the problem, VMI (Vendor Managed Inventory) model has been developed in industry.
VMI is a successful inventory control model for a two-stage supply chain in which a supplier directly manages the inventory level of a retailer (Achabal, Mcintyre, Smith, & Kalyanam, 2000). Within the VMI model, the retailer provides the supplier with information on its sales and inventory level and the supplier determines the replenishment quantity at each period based on the information. Throughout the VMI model, the supplier can set up efficient replenishment plans, while the retailer can receive appropriate amounts of replenishment on time (Kaipia et al., 2002, Lee et al., 2000).
Customer demands have recently become more and more unstable with the widespread introduction of e-commerce, because they easily fluctuate even with minor price changes on-line. The advent of products with a variety of qualities and functions is another reason of instability in customer demands, which, in turn, increases the uncertainty of demand forecasting followed by higher inventory costs due to unnecessary inventory surplus or shortage.
In fact, inventory control has been studied for several decades for cost savings of enterprises (Axsäter, 2000, Axsäter, 2001, Moinzadeh, 2002, Zipkin, 2000). They have tried to maintain appropriate inventory levels to cope with stochastic customer demands and to boost their image through customer satisfaction. However, most theoretical inventory models require that the statistical characteristics of customer demand is known or can be estimated through sophisticated time series models when customer demands show nonstationary behaviors. These prerequisites are not practical from the aspects of the analysis time and efforts, especially when the supplier deals with hundreds of different items and the most of their demands fluctuate over time differently. As a consequence, the importance of situation reactive models has surfaced with the necessity of adaptively controlling the parameters of inventory control models according to the change in customer demand (Alstrøm and Madsen, 1996, Gavirneni and Tayur, 2001, Graves, 1999).
Action-reward learning is considered as one of the reinforcement learning techniques. The action-reward learning progressively finds the best among several possible actions in a non-static environment (Sutton & Barto, 1998) through exploitation and exploration. The basic principle of action-reward learning is as follows. When an agent chooses a certain action, the state of the non-static environment changes and the reward for the action is also determined. The reward is a numerical value that is the input to the performance measure of the action. Through the repetitive process of applying actions, the agent continuously updates the performance measures of all actions and can choose the best action based on the updated performance measures. This conventional action-reward learning generally has a trade-off between exploitation and exploration. An exploitation is to choose an action with the best value of performance measure and apply it to the non-static control system, while an exploration is to choose an action with immature learning to boost the reliability of the performance measure of the chosen action.
Kim, Jun, Baek, Smith, and Kim (2005) proposed an adaptive inventory control model for supply chains with unstable customer demands by applying the conventional action-reward learning method. The model dealt with an inventory control problem where decision variable is order interval between a supplier and a retailer. Replenishment quantity is assumed to be fixed. They proposed a method that controls both supplier’s safety lead time and retailer’s safety stock adaptively according to the variation of customer demand stream. The objective of the model is to satisfy the target service level predefined for each retailer. In their approach, action is probabilistically selected in order to balance the exploitation and exploration. However, the probabilistic action selection rule brings about a problem that its learning rate is getting slow with many explorations as the number of actions increases. Thus, it takes a very long time to find a good decision policy due to exploration, particularly in on-line learning.
In this paper, we propose a situation reactive VMI approach that adapts replenishment quantity over time according to the changes in customer demand stream. To cope with the nonstationary demand situation, we develop retrospective action-reward learning model that is faster in learning than the conventional action-reward learning and more suitable to apply to the control domain where rewards for actions vary over time. The retrospective analysis based model improves the learning rate of action-reward learning by eliminating the exploration.
The objective function of the inventory control is to minimize the long run average of inventory shortage and holding costs that are incurred at each replenishment period. This approach does not assume that customer demand process does follow a specific stochastic model such as Markov chain (Gavirneni & Tayur, 2001) and autoregressive time series (Graves, 1999). In other words, any statistical assumption about customer demand is not required to compute the replenishment quantity. The replenishment quantity is a function of compensation factor (CF) that has an effect of increasing or decreasing the replenishment amount, and at each replenishment period, a cost-minimizing CF value is automatically chosen among the candidate set by using the retrospective action-reward learning.
The remainder of this paper is organized as follows. Section 2 introduces the basic concepts of action-reward learning and explains its application to nonstationary VMI situation. In Section 3, the situation reactive algorithm is presented in detail with some formal definitions. In Section 4, a simulation environment is explained and the results of the simulation based experiments are presented with discussions. Finally, conclusions are provided in Section 5.
Section snippets
Action-reward learning
The integral component of the situation reactive approach is action-reward learning. Popular reinforcement learning methods, such as Q-learning and temporal difference learning (Mitchell, 1997, Sutton and Barto, 1998), were developed for Markov decision process with incomplete information on state transition that is used for determining the best action for each visited state. In inventory control case, state corresponds to the amount of inventory remaining before replenishment decision is made,
Notations
The following are the notations necessary to describe the algorithm of the situation reactive approach formally.t replenishment period (t = 0, 1, 2, …), Dt actual customer demands realized during [t, t + 1), one step look-ahead forecasted customer demand at period t, estimated standard deviation of customer demand at period t, It inventory level in the beginning of period t, Qt replenishment quantity in the beginning of period t, a CF value, set of CF values (Θ = {ρ1, ρ2, …, ρn}), h inventory holding
Comparison model
If a problem with only one replenishment period is considered to minimize inventory cost at the end of the period, the problem can be formulated as the newsvendor model (Scarf, 1958). When customer demand follows a stationary probability distribution, the newsvendor model provides an optimal solution to minimize inventory cost. With many replenishment periods, the newsvendor model can be also applied at each period repetitively to derive replenishment quantity with the consideration of
Conclusion
In this paper, we proposed an adaptive VMI (Vendor Managed Inventory) model that controls replenishment quantity adaptively depending on a change in customer demand at each replenishment period in a two-echelon supply chain with unstable customer demands. This research provides two main contributions. First, an action-reward learning incorporating the retrospective analysis was newly proposed to resolve the problem of slow learning in conventional learning by eliminating exploration. Second,
Acknowledgement
This work was supported by Yonsei University Research Fund of 2002.
References (25)
- et al.
A decision support system for vendor managed inventory
Journal of Retailing
(2000) - et al.
Tracking signals in inventory control systems: A simulation study
International Journal of Production Economics
(1996) Future paths for integer programming and links to artificial intelligence
Computers and Operations Research
(1986)Inventory control
(2000)A framework for decentralized multi-echelon inventory control
IIE Transactions
(2001)Statistical forecasting for inventory control
(1959)- et al.
Campbell soup’s continuous replenishment program: Evaluation and enhanced inventory decision rules
Production and Operations Management
(1997) - et al.
An efficient procedure for non-stationary inventory control
IIE Transactions
(2001) A single-item inventory model for a non-stationary demand process
Manufacturing and Service Operations Management
(1999)Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence
(1975)