Policies for inventory models with product returns forecast from past demands and past sales

Finite horizon periodic review backlog models are considered in this paper for an inventory system that remanufactures two types of cores: buyback cores and normal cores. Returns of used products as buyback cores are modelled to depend on past demands and past sales. We derive an optimal inventory policy for the model in which returns are forecast to depend on past demands, and analyze properties of the optimal cost and optimal policy we derived. As the structure of the optimal inventory policy for the model in which returns are forecast from past sales is unlikely to be tractable, we instead consider a feasible inventory policy with a nice structure for this model. We investigate how close this policy is to optimality and find that in the worst case, the difference in system costs between the feasible policy and the optimal inventory policy is bounded by a constant that is dependent only on cost parameters, mean demands and a discount factor, and is independent of the planning horizon and initial inventories. We also perform numerical experiments to study the difference between system costs under the feasible policy and those under the optimal policy.


Introduction
Remanufacturing, an advanced form of recycling, has become an increasing concern for companies as sustainability gains importance. The remanufacturing process to restore a collection of cores 1 to excellent condition consists of procedures that may involve advanced technology. Such procedures include disassembly, cleaning, testing, parts replacement/repairs, and reassembly operations. Examples of remanufactured products are engines, photocopiers, toner cartridges, and the like.
The remanufacturing industry is large, comprising of many market sectors and providing significant economic, environmental, and societal benefits (Akçali and Çetinkaya 2011). For some manufacturers, such as Eaton Corporation, backed by Roadranger support (http:// www.roadranger.com/rr/Aftermarket/CoreBuyback/index.htm) , products sold to and used by consumers are actively sought back for remanufacturing. Such returned products are called buyback cores. Financial incentives are often used to encourage returns of these products for remanufacturing or traditional recycling. On the other hand, consumers also often return products that are more significantly worn out or are even damaged. We call those normal cores. A normal core is distinguished from a buyback core in that the normal core has a lower yield than the buyback core does. After undergoing the remanufacturing process, remanufactured products, which then are in good as new condition, can be sold to consumers. A remanufactured product and a manufactured product are treated as indistinguishable.
In this paper, we consider an inventory system that remanufactures returned products, and in which products returned as buyback cores are modelled to depend on past demands and past sales. We propose periodic review finite horizon backlog models for the system. We consider two types of cores in our models: buyback cores, which the remanufacturer purchases at a cost, and normal cores, which are likely to be damaged and returned by consumers. The remanufacturing cost for a buyback core is lower than that for a normal core because a buyback core is in better condition than a normal core is. Products are not manufactured from raw materials in our models, so all serviceable products come from remanufacturing. We consider a situation that is commonly encountered in practice, in which buyback cores are collected for products sold in the immediately previous period and earlier, and products sold too long ago, say, before a certain time, are not entitled for returns. That is, products can only be returned as buyback cores within a certain period of time after they are sold. For example, the remanufacturing facilities at Caterpillar Singapore (http://www.caterpillar. com) carry out a practice whereby there is an entitlement period during which sold products can be returned, and products beyond the entitlement period are not eligible for return. To be more specific, when an end-customer buys a remanufactured product from a Caterpillar dealer, he pays a price (composed of the actual selling price of the product and a deposit) that is the same as the price he would pay for a new product. The customer is also given an entitlement period of eight months during which he can return a used product to the dealer, and can get back part of his deposit, at an amount depending on the condition of the returned product. The exact percentage of the deposit that he can get back depends on the quality of the returned product, ranging from "full" to "partial" to "none".
A major assumption of many papers on managing dynamic remanufacturing inventory systems is that product returns and demands/sales across different periods are independent. This assumption can be justified when the product is widely spread out in the market or when a common component/material is recovered from different products (e.g., remanufacturing of consumer electronics); see Tao and Zhou (2014). Nevertheless, one can imagine that a Fig. 1 Pearson's R against Lag X correlation between demands/sales and returns is likely to exist in many remanufacturing systems. If a characteristic can be identified and used to forecast returns as part of managing a dynamic remanufacturing inventory system, it can potentially reduce system costs through better deployment of returned products. We provide empirical evidence to show the dependence of product returns on past sales in a remanufacturing system. In Sect. 3, we introduce a way to forecast returns of buyback cores that depends on past demands and sales. By introducing a way to model returns that are forecast from past demands and sales, we study inventory policies on the resulting models. We first derive a simple, explicit remanufacturing and disposal policy 2 for our backlog model in which returns are forecast from past demands. We show how that policy is affected by changes in the forecasting of returns when those changes are caused by changes in past demands. Then, we consider a model in which returns are forecast from past sales, and we study a feasible inventory policy for that model that is based on the optimal policy for the earlier model. We analyze how different this feasible policy is from the optimal policy in terms of system costs, and we also provide numerical evidence that suggests that the difference tends to be small.

Data analysis
We describe and analyze a data set from a remanufacturing-based company with an international presence, in order to illustrate the dependence of returns on past sales and the returns policy offered to customers. This builds the basis for us to consider incorporating core returns that are forecast from past demands/sales into an inventory model. The data set covers information on the sales and returns of seven different core types from two of the company's distribution centers, for the period January 2010 to January 2014. The company offers a returns policy that allows customers to return their cores within eight months. In our dataset, a total of 3084 sales transactions occurred, out of which 2447 cores were returned to the company. Of the remaining cores that were not returned, 232 had been purchased within eight months of the data being retrieved and were considered to be active cores. The other 405 observations were cores that were not returned and were considered to be attrition cores.
To examine the relationship between the number of returns in the current month and the sales figures from previous months, also known as lagged sales, we define Lag X sales as the relationship between returns and the sales quantity X months ago.
In Fig. 1, we have picked one of the seven core types and we show the correlation between the monthly buyback cores and their respective monthly lagged sales, including the upper and lower limit of the 95% level of confidence. In the figure, the y-axis refers to the Pearson's R (also known as the Pearson correlation coefficient), and the x-axis records the lagged sales, Lag X , against which the return data were measured. The figure shows that, with a 95% confidence level, returns are positively correlated to the sales X months ago for X = 0, 1, 2, . . . , 8, but the existence of such a correlation is not clear for X = 9 or 10. This observation is interesting because the company offers a returns policy of eight months. The figure shows that the returns policy set by a company can indeed affect the return time of cores. The other six core types also display similar patterns. This observation shows the potential of using returns that are forecast from past demands/sales in managing a remanufacturing-based system.

Literature review
The literature on closed-loop supply chains is vast. Akçali and Çetinkaya (2011) presented a review of the subject that includes a comprehensive list of references. Recently, Souza (2013) provided a review of the literature and a tutorial on closed-loop supply chains, in which he discussed a wide range of topics that include results on a base model with underlying assumptions, comments on extensions, and potential research areas. Among Souza's various topics, he discussed end-of-use returns with remanufacturing.
The literature on the study of remanufacturing-based inventory system includes papers by de Brito and ), DeCroix (2006, DeCroix and Zipkin (2005), Guo et al. (2014), Simpson (1978), van der Laan and Salomon (1997), and van der . A major assumption of these papers is that product returns and demands from different periods are independent. On the other hand, a case studied in Bayiz and Tang (2004) described correlated demand and return processes of a company that sells thermoluminescent badges and then in subsequent periods collects them back for refurbishment. The number of badges returned in a particular period is forecast using a linear combination of historical demands for the badge. By using actual data, Bayiz and Tang (2004) found that the forecast was rather accurate, with an average error of 24%. Works on stochastic and correlated demands and returns are rather limited due to the subject's complexity. In  (also see Li et al. 2009), the authors studied product returns for a periodic review finite horizon inventory model with backlogged demand. Those authors considered K types of core, with different conditions of returned cores, ranging from slightly used to significantly damaged, that can be remanufactured. The system also has a manufacturing capability.  offered an optimal policy for deciding the optimal quantity of serviceable products to be made available to consumers, and the optimal quantity of each type of core to remanufacture and to dispose of in each period, whereas in Li et al. (2009), the authors did not provide an optimal policy. The methodology used was stochastic dynamic programming. In the main model in , the authors assumed that product returns and previous demands are independent.  then briefly considered the dependence of returns on past sales in an extension to their main model. That dependence was in terms of a Markov process, and to model the case in which just old enough products can be returned, the authors considered only returns of products sold at least τ periods previously. The authors postulated the optimal policy for the extension in Theorem 5 of their paper. The dependence of returns that are forecast from past demands/sales in our paper complements that of , in that we consider the case whereby the current returns are dependent only on immediate past demands/sales, and products that were sold too long ago are not eligible for returns. In the previous subsection, we provided an analysis of a data set from a remanufacturing system to motivate our assumption. Tao and Zhou (2014) recently considered a single product, periodic-review inventory system with remanufacturable returned products, while assuming that demands and returns follow general stochastic processes and may be correlated. Those authors provided an efficient approximation algorithm, based on cost-balancing techniques, to compute manufacturing and remanufacturing quantities in each period, and they showed that the expected costs under that remanufacturing balancing policy was at most twice the optimal cost. In our paper, considering the fact that it is usually harder to obtain demand data than sales data, in addition to the model in which returns depend on past demands as considered in Tao and Zhou (2014), we develop a model in which returns depends on past sales. We will formulate the two models we consider in our paper in Sect. 3.
Kiesmüller and van der Laan (2001) considered a discrete-time system in which product returns in a period depend explicitly on the demand that existed some periods ago. Those authors assumed that returned products are directly added to the serviceable inventory, and that manufacturing follows a base-stock policy. We consider a different model setting from theirs, motivated by our empirical study. Among various results, Kiesmüller and van der Laan (2001) showed numerically that the dependence on past demands has a positive effect on optimal cost, compared with a situation in which product returns are independent of previous demands.
Models with returns that are dependent on past sales are considered in Kelle and Silver (1989b), Ketzenberg et al. (2006), Khawan et al. (2007), Toktay et al. (2000), and Hsueh (2011). Kelle and Silver (1989b) modelled the dependence of returns on sales by specifying deterministic probabilities for a sold product to be returned in the next period, the period after that and so on [also see Goh and Varaprasad (1986), and Kelle and Silver 1989a]. The dependence on past sales in our paper is different from theirs however, and coincides when the maximum returns period for our model is 1 or under certain assumptions about parameters of our model (see Remark 1). Kelle and Silver (1989b) reduced their stochastic inventory model to a deterministic, dynamic lot-sizing problem for which there are known solution methods. In our paper, we use stochastic dynamic programming in our analysis of inventory models. Ketzenberg et al. (2006) focused on the value of information in a closedloop supply chain. In their paper, dependence of returns on past sales followed that of Kelle and Silver (1989b), and was simplified in such a way that a sold product could only be returned in the next period with a certain probability, or not at all. That approach is similar to the way we forecast returns when returns in the current period are dependent only on the immediately previous sales. Khawan et al. (2007) considered an inventory system with warranty returns. They did not explicitly specify in their paper how returns are dependent on past sales. Toktay et al. (2000) considered a closed queueing network in their paper, wherein returns were modelled to depend on sales through an unknown return probability and delay distribution. Their dependence of returns on past sales was similar to that in Kelle and Silver (1989b). Instead of a deterministic probability for a sold product to be returned in a future period, as in Kelle and Silver (1989b), however, Toktay et al. (2000) considered the product of the probability that the product will be returned and a discrete delay density. Hsueh (2011) considered an inventory system with manufacturing and remanufacturing, taking into account different demand and return rates in different phases of the product life cycle. Those demand and return rates were normally distributed, with a different mean for each different phase of the product life cycle. In addition, the mean of the demand rate and that of the return rate were related. Hsueh provided formulae for the optimal production lot size, reorder point, and safety stock of the product for each phase of the product life cycle. Unlike Hsueh's (2011) model, ours does not assume a particular distribution for demands and returns. Relevant literature on inventory models with remanufacturing, in which optimal policies are studied, includes Zhou and Yu (2011), Gong and Chao (2013), and Tao et al. (2012). In those papers, product returns and previous demands are independent. Jia et al. (2016) explored a remanufacturing periodic review finite horizon inventory system with lost sales. They considered a switching mechanism whereby in the first half of the planning horizon, a push mode for remanufacturing is employed to satisfy demands, while in the second half of the planning horizon, a pull mode for remanufacturing is employed to satisfy demands. Their paper provided an optimal policy for the switching strategy, which possesses a simple, multi-dimensional base-stock structure. However, the sequence of events in Jia et al. (2016) is different from that in this paper. In our paper, we make remanufacturing decisions before products are returned in the current period [just as is the case in the model of , whereas in Jia et al. (2016), remanufacturing decisions are made after products are returned in the current period. Both situations can arise in practice.
Another stream of research on correlated demand and returns focuses on how to forecast returns by using appropriate statistical methods (e.g., Clottey et al. 2012;Toktay et al. 2004). The impacts of information, inventory decisions, pricing, and the use of a warranty on productreturns management have also been studied (e.g., Jing and Huang 2013;Koppius et al. 2004;Pourakbar et al. 2014;van der Laan and de Brito 2009;Xie and Ye 2016;Ye et al. 2013). More recently, Ovchinnikov et al. (2014) provided a data-driven assessment of the economic and environmental aspects of remanufacturing for product and service firms, and they presented an analytical model and a behavioral study that together incorporate demand cannibalization from multiple customer segments across a firm's product line. Ovchinnikov, et al. showed that remanufacturing frequently aligns firms' economic and environmental goals by increasing profits and decreasing total environmental impact.
Our paper considers data-driven models, and it provides analytical results for those models that potentially can be used to analyze the impact of information on product inventory management with returns. In the next section, we shall describe our backlog models.

Remanufacturing models: returns forecast from past demands and past sales
In this section, we describe our periodic review 3 finite horizon inventory models, with one model forecasting returns from past demands (Model A), and the other model forecasting returns from past sales (Model B). The second model is more realistic as sales data is usually easier to obtain than demand data, whereas with the first model, we are able to obtain a nice structure for its optimal inventory policy. Using results derived from the first model, we then analyze the second model. Two types of cores are considered in these models: buyback cores and normal cores. Buyback cores have better quality and usability than normal cores do. A characteristic of a buyback core is that its yield (i.e., its percentage of reusable parts) is higher than that of a normal core. On the other hand, a normal core has greater variety in its quality and usability. Unsatisfied demand is backlogged in our models, and we forecast returns of buyback cores from past demands in one model and from past sales in the second model. Returns and past demands/sales are not related in the case of normal cores.
We show in this section that the optimal policies for our models can be found by solving dynamic programs. We observe that our forecasts of returns for buyback cores affect the optimal policy only through past demands/sales, even though returns of those cores are modelled to depend on other (random) factors as well.
We now proceed to describing our backlog models by first defining the cost parameters used in those models. We have h = unit holding cost for serviceable products per period. p = unit penalty cost for serviceable products per period.
By serviceable products, we mean products that are ready to be sold to consumers. b = unit purchasing price of buyback cores.
A buyback core is purchased back from a consumer at cost b. Such a core is usually usable, but has suffered wear and tear due to usage. It is in better condition than a normal core is. c = unit purchasing price of normal cores.
A normal core can be purchased from a consumer at cost c. The value of c is much smaller than the value of b, because a normal core is usually in worse condition than a buyback core is. For the sake of simplicity, we set c = 0. r 0 = unit remanufacturing cost of buyback cores. r 1 = unit remanufacturing cost of normal cores.
Let r 0 < r 1 . This relationship between r 0 and r 1 reflects that a buyback core is in a better condition than a normal core. s 0 = unit stocking cost of buyback cores. s 1 = unit stocking cost of normal cores.
Let s 1 ≤ s 0 ≤ h. u = unit disposal cost of normal cores.
We assume in this paper that only normal cores can be disposed of, and that buyback cores are either stocked or remanufactured. This assumption is reasonable because buyback cores are usually in better condition than normal cores are.
Note that we consider a finite horizon in this paper, where N is the number of periods in the planning horizon. In our models, only products that are purchased at the most K periods before the current period, and up to the immediately previous period, are considered for returns as buyback cores. Hence, K is the maximum period for returns.
The variables in these models are: x 0,n = inventory level of serviceable products at the beginning of the nth period.
x 1,n = aggregate inventory level of serviceable products and buyback cores at the beginning of the nth period.
x 2,n = aggregate inventory level of serviceable products, buyback cores and normal cores at the beginning of the nth period.
x n = (x 0,n , x 1,n , x 2,n ), x 0,n ≤ x 1,n ≤ x 2,n . y 0,n = inventory level of serviceable products in the nth period after remanufacturing, but before demand and returns occur. y 1,n = aggregate inventory level of serviceable products and buyback cores in the nth period after remanufacturing, but before demand and returns occur. y 2,n = aggregate inventory level of serviceable products, buyback cores and normal cores in the nth period after remanufacturing and disposal, but before demand and returns occur. y n = (y 0,n , y 1,n , y 2,n ), y 0,n ≤ y 1,n ≤ y 2,n .
The variables given above are aggregated. We can easily obtain actual inventories from these variables. As an example, x 1,n − x 0,n is the number of units of buyback cores on-hand at the beginning of the nth period. w 1,n = quantity of buyback cores remanufactured in the nth period. w 2,n = quantity of normal cores remanufactured in the nth period. w n = (w 1,n , w 2,n ).
Randomness in the models comes from the following: D n = consumer demand for serviceable products in the nth period, n = 1, . . . , N . D n is a continuous nonnegative random variable with probability density function f D n (ξ ), ξ ≥ 0, and realization d n , n = 1, . . . , N . Also, we denote μ D n to be the finite mean of D n . R j n = k(n) i=1 σ n,i z j n−i + n = quantity of products returned as buyback cores in the nth period, n = 2, . . . , We have n−i := max{min{d n−i , y 0,n−i }, 0} are the respective realized demand and realized sales i previous period away from the current period, that is, the (n − i)th period. Note that σ n,i , i = 1, . . . , k(n), are random variables taking values between 0 and 1. The returns distribution is therefore not determined by previous demand/sales in a deterministic manner, but in a random way, due to σ n,i 4 which is random and a random noise term n . 5 R j n represents the return's forecasting of buyback cores and is modelled to depend explicitly on past demands/sales. It is clear that this return's forecasting in the nth period is dependent on the immediate previous demand/sales, up to demand/sales k(n) previous periods away. When j = A, returns are forecast to depend on past demands, which make analysis possible. We also consider the more realistic situation when returns are forecast to depend on past sales when j = B.
In the literature (for example, Kelle and Silver 1989b;Toktay et al. 2000), return's forecasting is modelled in a "forward" manner whereby given a product sold, the probability it is returned in the next period, the period after next, etc., are identified. In our case, we model return's forecasting in a "backward" manner whereby returns are modelled in the current period in terms of demands/sales in previous periods. B n = quantity of products returned as normal cores in the nth period, n = 1, . . . , N .
B n is a continuous nonnegative random variable with realization b n , n = 1, . . . , N . D n , B n , n , σ n,i , 1 ≤ i ≤ k(n), may be correlated in the nth period, but they are independent across different periods. This assumption is needed to formulate the inventory problems as dynamic programs as discussed later in the section.

Remark 1
If we view σ n,i z j n−i as the number of units of products returned as buyback cores in the nth period from demand/sales of these products i period earlier (which is z j n−i ), then σ n,i , 1 ≤ i ≤ k(n), are unlikely to be independent across periods since we must have σ n−i+1,1 + · · · + σ n,i + · · · + σ n−i+K ,K ≤ 1.
However, we still have independence across periods if σ n,i , 1 ≤ i ≤ k(n), 2 ≤ n ≤ N , are fixed numbers. Also, when K = 1, the above independence assumption across different periods can be enforced with this interpretation of σ n,i z j n−i . Furthermore, when K = 1 and if σ n,1 z B n−1 is binomially distributed with probability of success = p 0 and number of trials = z B n−1 , and n ≡ 0, then our return's forecasting model is the same as that of Ketzenberg et al. (2006) whereby a sold product can only be returned in the next period with probability p 0 or not at all.
The sequence of events for our models follows that of . At the beginning of each period, the remanufacturer decides how many units of buyback and normal cores to remanufacture. Then, the remanufacturer decides how many units of normal cores to dispose. Next, consumer demands and product returns are realized, and unsatisfied demands are fully backlogged. Finally, all costs are calculated. All lead times are assumed to be zero.
From now onwards, it is understood that the demand D n in the ith period can also be written as Z A i with realized demand denoted by d i or z A i . On the other hand, Z B i stands for the sales in the ith period, that is, with realized sales in the ith period denoted by z B i . We have the following straightforward observation on Z j i : We now write down the expected cost, due to holding/stocking, remanufacturing, disposal, purchasing and penalty, in the nth period, given z We use the same notation for the expected cost in the nth period for when returns are forecast from past demands and when returns are forecast from past sales.
Note that in the above expected cost expression, • s 0 (y 1,n − y 0,n + E(R j n )) + s 1 (y 2,n − y 1,n + E(B n )) = total stocking cost of cores in the nth period.
• r 0 w 1,n + r 1 w 2,n = total remanufacturing cost of cores in the nth period. • x 2,n − x 1,n − y 2,n + y 1,n − w 2,n = number of units of normal cores disposed of in the nth period, and hence, u(x 2,n − x 1,n − y 2,n + y 1,n − w 2,n ) = total disposal cost of normal cores in the nth period. • bE(R j n ) = expected total cost to purchase buyback cores in the nth period.
• h E(y 0,n − D n ) + = expected holding cost of serviceable products in the nth period.
• pE(D n − y 0,n ) + = expected penalty cost of serviceable products in the nth period.
(2) Therefore, by eliminating w n , the expected cost in the nth period given z j n−i , 1 ≤ i ≤ k(n), can be rewritten as where Before we continue, we let K = 1 from now onwards, that is, we consider returns only from products purchased in the immediate previous period. Hence, we assume that the maximum returns period for products returned as buyback cores is 1. As discussed in Remark 1, having K = 1 will enable our interpretation of σ n,1 z B n−1 as returns of buyback cores from sales in the previous period to hold without violating the independence assumption on σ n,1 across periods. Results derived in this paper for K = 1 are applicable for K ≥ 2, with the understanding that this independence assumption holds, such as when σ n,i is a fixed number for all 1 ≤ i ≤ k(n), 2 ≤ n ≤ N . Now, a policy π j = (π j 1 , . . . , π j N ) for our model, with returns forecasted from past demands when j = A and returns forecasted from past sales when j = B, is such that π j 1 (x 1 ) = y 1 , π j 2 (x 2 , z j 1 , b 1 ) = y 2 and for 3 ≤ n ≤ N , π j n (x n , z j n−1 , b n−1 , σ 2,1 , . . . , σ n−1,1 , 2 , . . . , n−1 ) = y n , where y n is constrained to satisfy y 0,n ≤ y 1,n ≤ y 2,n , y 1,n − x 1,n ≤ y 0,n − x 0,n , y 2,n ≤ x 2,n , y 1,n ≥ x 1,n , For a given policy π j = (π j 1 , . . . , π j N ) and 1 ≤ n ≤ N , the expected total cost from the nth period to the N th period given (x n , z j n−1 , b n−1 , σ 2,1 , . . . , σ n−1,1 , 2 , . . . , n−1 ) is where x 0,n+1 = y 0,n − D n , with R j n = σ n,1 z j n−1 + n , and for n + 2 ≤ i ≤ N , i−2 stands for the demand in the (i − 2)th period when j = A, and sales in the (i − 2)th period, defined by (1), when j = B. In (4), Following Bertsekas (2005), an optimal policy π j , * is a policy that minimizes the above expected cost from the 1st period to the N th period over all feasible policies π j , that is, Here, π A, * is the optimal policy for Model A, while π B, * is the optimal policy for Model B. The optimal policy π j , * can be found using dynamic programming technique, by solving a dynamic program as follows: Define V j 1 (x 1 ) to be the following minimization problem min y 1 subject to y 0,1 ≤ y 1,1 ≤ y 2,1 , where x 2 = (x 0,2 , x 1,2 , x 2,2 ) in the 2nd period is given by subject to y 0,n ≤ y 1,n ≤ y 2,n , where x n+1 = (x 0,n+1 , x 1,n+1 , x 2,n+1 ) in the (n + 1)th period is given by although it is easy to observe from (8) subject to constraints (9) for all z N −1 ≥ 0. Using our dynamic programming formulations, we have the following proposition: are obtained by solving the above dynamic program for each j = A, B.
By the above proposition, to find the optimal policy π j , * , we only need to find y j, * We know that return's forecasting of buyback cores is defined by past demands/sales and some random factors. We see from the above proposition that the effect the return's forecasting has on the optimal policy for the two models is only through past demands for Model A and past sales for Model B.
In Sect. 4, we provide a nice structure for the optimal inventory policy for Model A, the model where returns are forecast from past demands. Based on our results in the section, in Sect. 5, we propose a feasible policy for Model B, the model where returns are forecast from past sales, and analyze the extent to which this feasible policy is close to optimality. In Sect. 5.1, we provide numerical results.

An optimal inventory policy for Model A
We proceed in this section to state the explicit form of the optimal policy π A, * for our backlog model, Model A, which we formulate in Sect. 3, when returns are forecast from past demands. In each period, this policy can be described neatly in terms of optimal control parameters that are not dependent on inventories at the beginning of the period.
• If x 0,n < ξ 0,n ≤ x 1,n , we remanufacture up to ξ 0,n using only buyback cores without using any normal cores in the nth period, and stock the remaining x 1,n − ξ 0,n buyback cores for the next period. • If ξ 1,n (z A n−1 ) ≤ x 1,n < ξ 0,n , we remanufacture all available buyback cores without using any normal cores in the nth period.
• If x 1,n < ξ 1,n (z A n−1 ) ≤ x 2,n , we remanufacture up to ξ 1,n (z A n−1 ) using all available buyback cores and additional normal cores in the nth period.
• If x 0,n ≤ x 1,n ≤ x 2,n < ξ 1,n (z A n−1 ) ≤ ξ 0,n , we remanufacture all available buyback cores and normal cores in the nth period. and Disposal: • If η 2,n (z A n−1 ) ≤ x 1,n ≤ x 2,n , we dispose all available normal cores in the nth period.
The above rules for n = 1 and 2 ≤ n ≤ N constitute the optimal policy π A, * for our backlog model when returns are forecast from past demands.
In Theorem 1, we describe a simply stated optimal policy for our model. Observe from the theorem that the policy is essentially a "remanufacture-up-to" and "dispose-down-to" policy with remanufacturing and disposal levels characterized by optimal control parameters that depend on past demand and do not depend on initial inventories in each period.
In the next subsection, we describe how we obtain the policy by solving a minimization problem (11) subject to constraints (12) (given in the subsection).
In the following, we describe properties of the optimal cost and the optimal policy that we derived. Structural properties of optimal policies are often investigated in the literature, and can be found for example in Puranam and Katehakis (2014). First, it is interesting to investigate how returns forecasted from past demands affect the optimal policy for Model A. For n = 1, it is clear that optimal control parameters for our optimal policy are independent of past demands. For 2 ≤ n ≤ N , the following theorem describes how optimal control parameters ξ 1,n (z A n−1 ), η 2,n (z A n−1 ) vary with z A n−1 .
In our model, we consider returns forecast of buyback cores, and return's forecasting is based on past demand for serviceable products (z A n−1 = d n−1 ). The larger/smaller the value of d n−1 , the forecast is for larger/smaller number of buyback cores to be returned. As d n−1 increases, ξ 0,n is unchanged while ξ 1,n (d n−1 ) and η 2,n (d n−1 ) are nonincreasing (by Theorem 2). We see from Theorem 1 that as a result, in the current period, if the forecast is an increase in buyback core returns (as there is an increase in past realized demands), we are more unlikely to remanufacture normal cores and instead dispose of them, while the remanufacturing decision on buyback cores is not changed.
Next, we observe the following property of ξ 0,n and the optimal cost V A 1 (x 1 ): It is clear from the above theorem that to keep system cost down over the planning horizon, the initial inventory of serviceable products x 0,1 cannot be too small, in particular, it should not be smaller than ξ 0,1 . Furthermore, by the above theorem, we know that the optimal control parameter ξ 0,n is greater than or equal to ξ 0,N for 1 ≤ n ≤ N . A natural question to ask is whether we have monotonicity of ξ 0,n in n. The following example illustrates that this is not possible in general: The above example shows the non-monotonicity of ξ 0,n in n.

Verification of Theorem 1
In this subsection, we proceed in an abstract manner, analyzing a minimization problem that is an abstraction of the optimality equation in our dynamic programming formulation in Sect. 3. We obtain results by analyzing this minimization problem, and these results enable us to arrive at the optimal policy for our backlog model, Model A, in Theorem 1.
First, we abstract the expected one period cost function U n (x n , y n , z A n−1 ) by the function C( y, z), which is defined to be where y = (y 0 , y 1 , y 2 ), β is a given constant and D is a continuous nonnegative random variable.
It is easy to see that C( y, z) is a continuously differentiable convex function of ( y, z) and is additively separable in ( y, z).
We consider the following minimization problem, which is an abstraction of our dynamic program in Sect. 3: subject to Here in the dynamic program ((8) subject to constraints (9)) that we use to find the policy for our model, Model A. We list below essential properties that K ( y, z) is assumed to satisfy. These properties reflect the term E D n ,B n ,σ n,1 , n (V A n+1 (x n+1 , Z A n )) it represents, and is satisfied by ) as shown in the proof of Theorem 1. The properties that K ( y, z) satisfies are as follows: 1. K ( y, z) is a continuously differentiable convex function of ( y, z). 2. K ( y, z) is additively separable in y = (y 0 , y 1 , y 2 ), that is, K ( y, z) = K 0 (y 0 , z) + K 1 (y 1 , z) + K 2 (y 2 , z), for some function K i (y i , z), i = 0, 1, 2. 3. K ( y, z) is additively separable in y 0 and z, that is, K ( y, z) can be written as the sum of two functionsK 0 (y 0 , y 1 , y 2 ) andK 1 (z, y 1 , y 2 ).
With the above, we then obtain in Theorem 4 (given below) the optimal solution to the minimization problem (11) subject to constraints (12). Theorem 4 allows us to obtain the explicit form of the optimal policy π A, * for our model in Theorem 1.
Let us denote the objective function C( y, z) + α K ( y, z) in the minimization problem (11) subject to constraints (12) by ( y, z) for convenience.
Following , let ξ 0 (z) ∈ argmin y 0 (y 0 , y 1 , y 2 , z), The above parameters will be used to solve the minimization problem (11) subject to constraints (12). They are then used to define the optimal control parameters for our optimal policy π A, * . By Remark 2, we see that ξ 0 (z) is not dependent on z. Hence, we write ξ 0 for ξ 0 (z) from now onwards. Note that the way we prove that ξ 0 , ξ 1 (z) and η 2 (z) are optimal control parameters, which is the result of Theorem 4 that leads to Theorem 1, is not identical to that in . We rely on the Karush-Kuhn-Tucker (KKT) conditions to prove this.
Parameters ξ 0 , ξ 1 (z) and η 2 (z) do not depend on y 0 , y 1 or y 2 due to the additive separability of ( y, d) in y. They may be equal to +∞ or −∞ though.

Remark 3
In the case η p 2 (z) < ξ p 1 (z), and η 2 (z) and ξ 1 (z) are defined by ξ 1 (z) = η 2 (z) ∈ argmin y 0 (y 0 , y 0 , y 0 , z), then it is easy to check that η In any case, we have Note that the way we define the above parameters that eventually give rise to the optimal policy π A, * for our backlog model when returns are forecast from past demands is similar to that in . These parameters are used in Theorem 4 to define optimal solution to the minimization problem (11) subject to constraints (12). Theorem 4 is proved by using the KKT conditions. ξ 0 and ξ 1 (z) may be thought of as "remanufacture-up-to" parameters, while η 2 (z) is the "dispose-down-to" parameter. Depending on the values of x 0 , x 1 , x 2 , the system may remanufacture up to ξ 1 (z) or ξ 0 . Similarly, depending on the value of x 0 , x 1 , x 2 , some of the normal cores may be disposed such that the aggregate inventory level of serviceable products, buyback and normal cores is down to the level η 2 (z).
Let us define ξ −1 (z) = ∞. This definition is needed in the statement of Theorem 4 below, where we provide an optimal solution to the minimization problem (11) subject to constraints (12).
The above theorem is proved by verifying that the defined (y * 0 (x, z), y * 1 (x, z), y * 2 (x, z)) satisfies the KKT conditions for (11) subject to constraints (12). This is done by exhausting all the different scenarios in which x 0 , x 1 , x 2 , ξ 0 , ξ 1 (z), η 2 (z) can be arranged. Satisfying the KKT conditions is necessary and sufficient for optimality, since the minimization problem is a convex program and the Slater's condition holds true trivially.
We can alternatively express the policy in Theorem 1 in terms of y A, * 1 (x 1 ) and y A, * n (x n , z A n−1 ), 2 ≤ n ≤ N , which have similar expressions as (y * 0 (x, z), y * 1 (x, z), y * 2 (x, z)) in the above theorem.
We end this subsection with the following two propositions on K (x, z), which are needed when Theorem 4 is applied in the proof by induction to show Theorem 1.
Proof Consider the following set It is easy to show that C is a convex set in 5 , since C( y, z) and K ( y, z) are convex functions of ( y, z). Therefore, by Theorem 5.3 of Rockafellar (1970), f (x, z) = inf{v ; (x, z, v) ∈ C} is a convex function of (x, z). Since f (x, z) = K (x, z), we then have K (x, z) is a convex function of (x, z). The consequence in the proposition follows from Theorem 25.5 of Rockafellar (1970).

Proposition 5 K (x, z) is additively separable in x and is also additively separable in x
The idea behind the proof of the above proposition is to use Theorem 4 to express K (x, z) explicitly in terms of expressions that are defined to be K 0 (x 0 ), K i (x i , z), i = 1, 2.

A feasible inventory policy for Model B
As Z B n in (6) subject to constraints (7), and (8) subject to constraints (9), where j = B, are given by (1), it is unlikely that y B, * the optimal policy π B, * have easily tractable structures. Given that we have obtained a nice structure for the optimal policy for Model A in Sect. 4, we can use this policy as a feasible policy for Model B by defining the feasible policy π = (π 1 , . . . , π N ) in the following way: Let π 1 (x 1 ) := y A, * 1 (x 1 ), π 2 (x 2 , z B 1 , b 1 ) := y A, * 2 (x 2 , z B 1 ), and for 3 ≤ n ≤ N , π n (x n , z B n−1 , b n−1 , σ 2,1 , . . . , σ n−1,1 , 2 , . . . , n−1 ) := y A, * n (x n , z B n−1 ). It is clear that π defined in the above way is a feasible policy for Model B, the model where returns are forecast from past sales. Hence, A natural question to ask is how close the feasible policy is to optimality. An attempt to answer this question is to compare the system cost under this feasible policy with the optimal system cost. This is what we proceed to achieve. We do this by using what we know so far -the structure of the optimal policy π A, * for Model A. We use it to analyze In what follows, we write z n−1 without a superscript to indicate that we are not attaching any meaning to this variable as past demand or sales, but merely treating it as a generic nonnegative variable.
Let us impose the following conditions on our cost parameters: These conditions are reasonable conditions for the model. The first three conditions encourage remanufacturing, the only way to have enough serviceable products to satisfy demand, to avoid backlog, while the last condition discourages remanufacturing of normal cores in favor of disposal when there is no demand for serviceable products to avoid stocking excess serviceable products obtained from remanufacturing. These conditions are needed to prove the following proposition:

Proposition 6
We have, for 2 ≤ n ≤ N , wherever the partial derivatives are defined.
Note that without Condition 1, we can also obtain a similar result as Proposition 6, but the analysis to obtain the result will be more complicated.
The above proposition is proved by induction using the structure of the optimal policy in Theorem 1 (see also Theorem 4) and the definitions of ξ 0 , ξ 1,n (z n−1 ), η 2,n (z n−1 ) as found in (13) and in Proposition 3. Also, the following holds: Proposition 7 We have, for 2 ≤ n ≤ N , wherever the partial derivatives are defined.
Using Propositions 6 and 7, we are ready to find an upper bound for the difference between the optimal cost V B 1 (x 1 ) and the cost under our feasible policy V π,1 (x 1 ). Before we do this, we need the following two propositions which follows from the above two propositions: z n−1 , b n−1 , σ 2,1 , . . . , σ n−1,1 , 2

Proposition 9
We have, for 2 ≤ n ≤ N , where μ n = max{μ D n , . . . , μ D N −1 }, 2 ≤ n ≤ N − 1, and μ N = 0. Consequently, We have the following lemma which follows from the above two propositions: 1 (x 1 ) and V π,1 (x 1 ), we have the following situations: The required results then follow by applying Propositions 8 and 9, and noting that for all k ≥ 0.
We note the following: We now have the main result of this section: Theorem 5 For 0 < α < 1, Proof Observe that when p ≤ (1 − α)(r 0 + (s 0 + b)/α), then by Proposition 10, V π,1 (x 1 ) ≤ V A 1 (x 1 ), the result then follows from Lemma 1. On the other hand, and the result also follows from Lemma 1.
Observe from the above theorem that when the discount factor α is small, we can approximate the optimal policy π B, * by the feasible policy π well, as the upper bounds tend to zero as α approaches zero. We also observe from the theorem that the difference between V π,1 (x 1 ) and V B 1 (x 1 ) is bounded above by constants that are independent of the planning horizon and initial inventories, but only dependent on cost parameters r 0 , s 0 , s 1 , b, p, u, discount factor α and mean demands μ D i , 1 ≤ i ≤ N − 1. It is not surprising that the constants depend on s 0 , b and r 0 , since we are comparing with the feasible policy π, which is related to Model A, to obtain the above bounds and the difference between Model A and Model B is the way in which returns of buyback cores are forecast, and s 0 , b and r 0 are cost parameters related to buyback cores. On the other hand, the only effect normal cores has on the above bounds is through s 1 and u. The unit penalty cost, p, only appears in a bound when it is large enough, in particular, larger than (1 − α)(r 0 + (s 0 + b)/α). It can be imagined that when the initial inventories of serviceable products, buyback cores and normal cores are low, the difference between Z A n and Z B n is likely to get larger and larger as n increases resulting in more penalty cost being incurred for Model B compared to Model A. This is so because the number of units of buyback cores returned for the former gets smaller compared to that for the latter. Hence, since the upper bound for the difference V π,1 (x 1 ) − V B 1 (x 1 ) is obtained by comparing Model A with Model B, when p is large, it appears in the upper bound. This reflects the difference in penalty costs between the two models when inventories are low. The unit holding cost, h, does not play a role in these constants since when there are more serviceable products than demand, then Z A n is equal to Z B n , and there is no difference between Model A and Model B in the nth period.

Numerical study
In our numerical experiments, we investigate the difference in costs, V π,1 (x 1 ) − V B 1 (x 1 ), by varying initial inventories (at the start of the planning horizon) and parameters of our models. Note that the upper bounds for V π,1 (x 1 ) − V B 1 (x 1 ) given in Theorem 5 are worst case and we expect that the actual differences to be smaller than these upper bounds. This is substantiated by the numerical results we obtained which are given below.
To be realistic, in our numerical experiments, we set the length of the planning horizon, N , to be 6, with a set of numerical experiments having N from 3 to 15 in increment of 3. Because of the curse of dimensionality when solving dynamic programs, our experiments are such that the dynamic program for Model A and Model B (both presented in Sect. 3) are solved only for 3 periods, instead of the whole planning horizon with length N , which can be greater than 3.
To implement our models with dynamic programs being solved only for 3 periods even though the length of the planning horizon can be greater than 3, in the 1st period, we solve the dynamic program with length 3 for Model A and Model B to obtain the number of units of serviceable products to remanufacture from buyback or normal cores and the number of units of normal cores to dispose of. Also, for the same realized demand, we obtain the initial inventories of serviceable products, buyback cores and normal cores at the beginning of the 2nd period under the feasible policy π (which is related to Model A), and the optimal policy π B, * (which is related to Model B). The expected costs in the 1st period under the two policies are also computed for the same realized demand in the 1st period. We then solve the dynamic program for Model A and Model B again, but now for n = 2, to obtain the number of units of serviceable products to remanufacture from buyback or normal cores and number of units of normal cores to dispose of. The initial inventories for the 3rd period are then updated for the same realized demand, but different realized returns of buyback cores, which depends on the serviceable products available in the 1st period under the feasible policy π (which is related to Model A) and the optimal policy π B, * (which is related to Model B). The expected cost in the 2nd period under the feasible policy π and under the optimal policy Table 1 Effect of different initial inventories on average system costs under the feasible and optimal policies (h = 1.0, p = 2.0, α = 0.5, N = 6) π B, * are also computed for the same realized 2nd period demand. N is always a multiple of 3 in our numerical experiments. If N is larger than 3, then at the 4th period, we solve the dynamic program for Model A and Model B again, for a horizon of length 3, treating the different inventories we obtained at the end of the 3rd period, after taking into account realized demand and returns of cores, as initial inventories for each dynamic program. We continue in this way if N is larger than 6. For each policy, the total expected cost for the whole planning horizon N is the sum of appropriately discounted expected single period cost, given by (3). We consider integral inventories and demands in our numerical study. For each policy, the programs are run R times, and we obtain the average system cost under each policy (V π,1 andV B 1 ) by summing the total system cost obtained in each run, as described in the last paragraph, and then divide by R, where R is taken to be 100. The rounded uniform distribution U R (0, 15), a discrete probability distribution, is considered for D n . We let the maximum returns period K be 1, and in our dynamic programs, σ n,1 is such that σ n,1 z j n−1 is binomially distributed with probability of success p 0 and number of trials z j n−1 , j = A, B. We let n ≡ 0. Hence, we consider the situation when a product is either returned with probability p 0 in the next period or not at all.
In all our numerical experiments, we set b = 1.0, r 0 = 1.0, r 1 = 1.0, s 0 = 1.0, s 1 = 1.0, u = 1.0, p 0 = 0.8 and B n ≡ 5. Our numerical results show that the percentage differencê × 100 is not greater than 3.50%, with only two instances beyond 3.00%. Table 1 shows howV π,1 −V B 1 varies with changes in the initial inventories of serviceable products, buyback cores and normal cores. We see from the table that there is no set pattern to how the difference varies with changes in x 0,1 , x 1,1 , x 2,1 . The maximum value forV π,1 −V B 1 in the table is 0.74, which is much lesser than the predicted upper bound Table 2 Effect of different length of planning horizon on average system costs under the feasible and optimal policies (x 0,1 = 5, x 1,1 = 10, x 2,1 = 15, h = 1.0, p = 2.0, α = 0.5)  Table 3 Effect of different discount factor on average system costs under the feasible and optimal policies (x 0,1 = 5, x 1,1 = 10, which holds for all values of (x 0,1 , x 1,1 , x 2,1 ) in the table.
As shown in Table 2, the differenceV π,1 −V B 1 does not vary very much as the length of the planning horizon N increases. In Theorem 5, the upper bounds provided are also independent of N .
In line with Theorem 5, we see from Table 3 that the difference inV π,1 andV B 1 increases with increase in the value of the discount factor, α, although, the actual difference is smaller than the upper bounds provided in the theorem. For example, when α = 0.2, we have the theoretical upper bound, as given by Theorem 5, of  When α = 0.8, we have the theoretical upper bound, 6 as given by Theorem 5, of and we have theoretical upper bound In Table 4, we see that as the unit penalty cost p increases from 1.0 to 4.0, the differencê V π,1 −V B 1 decreases steadily from 0.65 to 0.13, except for an increase when p increases from 1.5 to 2.0. However, from Theorem 5, we expect the difference to increase with p, since the upper bound in the theorem increases with p. A reason for the increase in upper bound with p in the theorem is because the upper bound is obtained by considering Models A and B, and returns of buyback cores for Model A are dependent on past demands, while returns of buyback cores for Model B are dependent on past sales, which can be low when there are insufficient serviceable products. Remanufacturing is therefore unaffected for Model A, while there may be fewer buyback cores to remanufacture to serviceable products for Model B, due to low past sales, to satisfy current demand. In the worst case, this difference in penalty costs, due to unsatisfied demand, becomes apparent as p becomes large, leading to an upper bound for V π,1 (x 1 ) − V B 1 (x 1 ) that depends on and increases with p. This effect is not present when we are computingV π,1 andV B 1 , because returns of buyback cores under policies π and π B, * now both depend on past sales, and also as p increases, both policies become more similar to each other and act to remanufacture any available cores to serviceable products to satisfy demand.
In Table 5, we observe that the dependence ofV π,1 −V B 1 on unit holding cost h is not apparent, which is in line with the independence of the upper bounds in Theorem 5 on h.
We end this subsection by investigating whether the "myopic" policy is good enough as an approximation to the feasible policy π in deciding the number of units of buyback and normal cores to remanufacture and the number of units of normal cores to dispose in each Table 5 Effect of different unit holding cost on average system costs under the feasible and optimal policies (x 0,1 = 5, x 1,1 = 10, x 2,1 = 15, p = 2.0, α = 0.5, N = 6)  Table 6 Effect of different initial inventories on average system costs under the "myopic" and feasible policies (h = 1.0, p = 2.0, α = 0.5, N = 6) period. It has been shown in the literature, such as Ignall and Veinott (1969), that under certain situations, "myopic" policy can be optimal. An advantage of the "myopic" policy over the feasible policy π for Model B is that the former only requires optimizing the single period cost function at each period which can be implemented easily, while the latter requires solving (13) to find optimal control parameters to make remanufacturing and disposal decisions, and this can be challenging. Denote the average system cost under the "myopic" policy byV 1 . This is obtained by summing the total system cost obtained in each run under this policy, and then divide by R, where R is the total number of runs and is taken to be 100. As shown in Tables 6 and 7, the "myopic" policy approximates the feasible policy π reasonably well, with the highest percentage difference in average system costs being 17.46%, and the lowest percentage difference being 2.00%. Out of the 17 scenarios tested, 11 scenarios have percentage difference in average system costs less than 10%. Table 7 Effect of different length of planning horizon on average system costs under the "myopic" and feasible policies (x 0,1 = 5, x 1,1 = 10, x 2,1 = 15, h = 1.0, p = 2.0, α = 0.5)

Concluding remarks
In this paper, we describe two models for our remanufacturing inventory system, incorporating a forecasting method for returns, that depends on past demands and sales of buyback cores. This paper considers two types of cores: buyback cores and normal cores. In Theorem 1, through analyzing a dynamic program, we obtain optimal control parameters to describe the optimal policy for the backlog model when returns are forecast from past demands. Properties of the optimal cost and the optimal policy we obtained are also provided. Then, in Sect. 5, we study a feasible inventory policy for the model in which returns are forecast from past sales, and we show how close this feasible inventory policy is to the optimal inventory policy by studying the difference in the expected costs under each of these policies. A question that arises at this point is whether the theoretical upper bounds given in Theorem 5 are tight, and we leave this as a future work. Numerical results are given in Sect. 5.1.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Let V j, * 1 (x 1 ) be the optimal cost from the first period to the N th period, given x 1 at the beginning of the first period. Then, V j, *  1 , b n−1 , σ 2,1 , . . . , σ n−1,1 , 2 , . . . , n−1 ) be the optimal cost from the nth period to the N th period.
On the other hand, y j , * n (x n , z j n−1 ), n 0 − 1 ≤ n ≤ N , form a "partial" policy from the (n 0 − 1)th period to the N th period, with expected cost V j n 0 −1 (x n 0−1 , z j n 0 −2 ). Since V j, * n 0 −1 (x n 0−1 , z j n 0−2 , b n 0−2 , σ 2,1 , . . . , σ n 0 −2,1 , 2 , . . . , n 0 −2 ) is the optimal expected cost over all "partial" policy from the (n 0 − 1)th period to the N th period, we must have V j, * Hence, statement holds for n = n 0 − 1. This implies by induction that statement holds for 2 ≤ n ≤ N . Similar arguments as above apply to show that V j, * , which follows from the above statement when n = 2.
Proof of Theorem 1 For 2 ≤ n ≤ N , the theorem follows from Theorem 4 by letting in Sect. 4.1, where x = x n , y = y n , z = z A n−1 , β = (s 0 +b)E(σ n,1 ). Note that K n ( y b , z A n−1 ) is defined by K n ( y n , z A n−1 ) = E D n ,B n ,σ n,1 , n (V A n+1 (y 0,n − D n , y 1,n − D n + σ n,1 z A n−1 + n , y 2,n − D n + σ n,1 z A n−1 + n + B n , D n )), In order to apply Theorem 4, we need to show that the following statement holds for 2 ≤ n ≤ N : Statement: K n ( y n , z A n−1 ) defined by (19) satisfies Properties 1 -4 for 2 ≤ n ≤ N We show this by induction on n, 2 ≤ n ≤ N . For n = N , V A N +1 (x N+1 , z A N ) ≡ 0, hence Properties 1 -4 hold trivially for K N ( y N , z A N −1 ). In particular, Property 4 holds in this case since r 1 > r 0 .
By similar arguments as above, we can apply Theorem 4 to the case n = 1 to conclude the proof of the theorem.

Proof of Theorem 4 As
if no such m exists, can be checked easily.
Next, we show that (y * 0 (x, z), y * 1 (x, z), y * 2 (x, z)) given by the expressions in the statement of the theorem is an optimal solution to (11) subject to constraints (12) .
Writing ( y, z) as 0 (y 0 ) + 1 (y 1 , z) + 2 (y 2 , z), the KKT conditions for (11) subject to constraints (12) are given by ⎛ Since (11) subject to constraints (12) is a convex program and the Slater's condition holds trivially, (y * 0 (x, z), y * 1 (x, z), y * 2 (x, z)) is an optimal solution to (11) subject to constraints (12) if and only if it satisfies the KKT conditions. Hence, to show that y * i (x, z), i = 0, 1, 2, given by the expressions in the theorem, are indeed optimal solutions, we only need to show that they satisfy the above KKT conditions.
The expressions for y * i (x, z), i = 0, 1, 2, given in the statement of theorem are z), i = 0, 1, 2, satisfy the above KKT conditions. Hence, theorem holds for this subcase.
Other cases with their associated subcases can be considered in a similar manner to show that theorem holds for them. In particular, we conclude the proof by considering an illustrative case as follows: Illustrative case: The expressions for y * i (x, z), i = 0, 1, 2, given in the statement of theorem are y * 0 (x, z) = x 1 , y * 1 (x, z) = x 1 , y * 2 (x, z) = x 1 . Let λ 3 = λ 4 = 0, and the rest of λ i ≥ 0 are chosen such that i (x, z), i = 0, 1, 2, KKT conditions are satisfied, and the theorem holds for this illustrative case.

Proof of Proposition 5
We consider the case when ξ 0 ≤ η 2 (z). The case when η 2 (z) ≤ ξ 0 can be considered in a similar manner.
From the above expression for K (x, z), it is clear that it is additively separable in x and is also additively separable in x 0 and z.
Hence, we obtain the required result.
We show that for 1 ≤ n ≤ N , if D n is identically distributed, then ξ 0,N ≤ ξ 0,n by induction on n.

Proof of Proposition 6
To prove the proposition, we first write down explicit expressions for partial derivatives of V A N wherever they are defined, using Theorem 1 (or Theorem 4), as follows: Case 1: x 0,N ≤ x 1,N ≤ x 2,N ≤ ξ 1,N ≤ ξ 0,N .
The proposition then follows by induction on n using the dynamic programming formulation of V A n (x n , z n−1 ) in Sect. 3, Theorem 1 (or Theorem 4), Condition 1 and definitions of ξ 0 , ξ 1,n (z n−1 ), η 2,n (z n−1 ) as in (13) and in Proposition 3. In particular, in the induction step, we need the followings: