Censored Newsvendor Model Revisited with Unnormalized Probabilities

This paper revisits the model of the censored newsvendor presented by Ding, Puterman and Bisi (2002). We analyze that model in an infinite-horizon context by using the interesting concept of unnormalized probabilities. The unnormalized probabilities considerably simplify the dynamic programming equation and facilitate the proof of the existence of an optimal policy. They can also be used to give a simple, alternative proof to Ding et al.'s claim that the myopic order quantity is always less than or equal to the optimal order quantity. Importantly, the concept of unnormalized probabilities can be used to treat other important operations research problems with partial observations.

1. Introduction. The newsvendor problem concerns with the optimization of the inventory level at the beginning of a sales season to meet the demand during the season. When the inventory level is more than the demand during the season, costs are incurred on account of the leftover inventory at the end of the season. Otherwise, some penalty is assessed on the unmet demand. Since the leftover inventory is salvaged at the end of a season in the newsvendor problem, the inventory level at the beginning of the next season is always zero. Thus, the order quantity and the inventory level are always the same. They are used interchangeably in this paper.
Although there is an extensive literature on the newsvendor problem, the unobservability of the unmet demand has only recently been emphasized by Lariviere and Porteus [9], Ding, Puterman and Bisi [8] and Bensoussan, Ç akanyıldırım and Sethi [3]. In this paper, we consider the multiperiod newsvendor problem of Ding et al., in which the demand in each period is observed fully when it is satisfied from the available inventory. Otherwise, only the event that the demand is larger than or equal to the inventory is observed. This problem requires inventory level optimization with censored demand data. It is an example of problems with partial observations (Bensoussan [1] and Monahan [12]). In these problems, the state of the dynamic program (DP) is generally the conditional distribution of the system state. For example both in this paper and in Ding et al., the system state is a demand parameter but the DP state is this parameter's conditional distribution given the censored demand observations.
We replace the conditional probability of the demand parameter by the so-called unnormalized probability that evolves linearly over time. This replacement provides a simpler but an equivalent DP equation. This simplicity facilitates the proofs of results such as the existence of an optimal policy and the comparison of the optimal and myopic order quantities. It also simplifies the computation of an optimal order quantity. It is fair to say that the efficient computation of the optimal order quantities are critical for an implementation of the model in practice. Moreover, the concept of unnormalized probabilities can be used to analyze many other partially observed problems. Bringing the unnormalized probabilities to the attention of the operations research community is an important contribution of this paper.
Another contribution of our paper relates to the Ding et al.'s claim that the myopic order quantity is always smaller than or equal to the optimal order quantity. Although the claim is true, its proof is found to be erroneous by Lu, Song and Zhu [10], when there are more than two periods in the planning horizon. Lu, Song and Zhu [11] provide another proof for the claim. In this paper, we provide a short, simple proof for Ding et al.'s claim. Moreover, our proof can generalize to other related inventory problems with partial observations. The plan for this paper is as follows. In the next section we set up the model and obtain a DP equation, whose argument is the conditional probability of the demand parameter. In Section 3, we introduce the important concept of unnormalized probabilities and obtain an equivalent DP equation. We prove the Ding et al.'s claim in Section 4. Some concluding remarks are provided in Section 5.

2.
Preliminaries. Let (Ω, F, P) be the probability space and let n ≥ 1 be the period index. Let x n ≥ 0 denote the demand in period n. The demands are assumed to be independently and identically distributed, and each demand depends on an unknown parameter θ. Given θ, the demand has the corresponding density and cumulative probability functions f (·|θ) and F (·|θ). LetF (·|θ) = 1 − F (·|θ).
In this paper, we prove the existence of a feedback ordering policy under the assumption that the mean demand is finite. Strictly speaking, we assume that there exists a sufficiently large constant M such that for each θ. The assumption of a finite demand mean is customary in the inventory literature, and it is used to keep the inventory related costs finite. In applications where the unknown parameter θ is the mean of the demand distribution, this assumption holds trivially with M = 1. For brevity, we omit the limits of an integral if the integral is taken over [0, ∞) as in (1).
The inventory available to satisfy the demand x n , or a part thereof, is called y n . We can think of y n to be the inventory after the nth period order arrives but before the nth period demand x n materializes. We let the sales z n be given by z n := min{x n , y n }.
(2) When x n > y n , the inventory is not sufficient to meet the demand in period n. In that case, the amount of sales is y n and x n − y n is the unmet demand. When the demand is not met, the magnitude of the unmet demand is not observed by the inventory manager (IM). Indeed, the IM observes only the sales. Let Z n be the sigma algebra generated by the sales, i.e., Z n := σ(z 1 , . . . , z n ). Thus, Z n is the information available to the IM at the end of period n. Since the IM decides on y n at the beginning of period n, y n is Z n−1 measurable. In our partially observed demand model when x n ≥ y n , x n is not Z n measurable. The IM decides on y n at the beginning of period n after observing the previous sales {z j : j ≤ n − 1}. Let the function L(x, y), which depends on the demand x and the available inventory level y ordered to meet the demand, denote the one-period cost function: Costs h and c can be interpreted as the salvage value per unit and the ordering cost per unit. Note that leftover inventory is salvaged at the end of every period. Since the demand x ≥ 0, L(x, y) decreases as y increases for y ≤ 0. Thus, we can restrict y to [0, ∞) in our minimization problem. The cost b is the backorder (resp. lost sales) cost per unit in a backorder (resp. lost sales) case.
With the discount factor 0 < α < 1 and with y defining the sequence of inventory levels y = {y 1 , y 2 , . . . }, our objective is to minimize Since the leftover inventory is always salvaged and we require y n ≥ 0, the inventory level at the beginning of each period can be taken to be zero whether or not y n −x n > 0. This, however, implies that y n+1 is independent of y n . In other words, periods are separable in terms of inventory, so there is no inventory state equation in our formulation. On the other hand, periods are coupled by the update of the demand distribution, whose evolution over time is studied next. Let π n be the distribution of the parameter θ in period n before observing z n . That is, the probability density function of x n is f (·|θ)π n (θ)dθ when π n is given. We start with a known π 1 and obtain the evolution of the process {π n : n ≥ 1}: This state evolution equation is equivalent to equation (5) in Ding et al. [8]. From (4), with initial π 1 = π we define the value function V (π) as follows: By the optimality principle, where from (5) we have Combining (6) and (7), we obtain the DP equation Our DP equation corresponds to equation (8) in Ding et al. [8], but we keep ours more explicit by writing it in terms of only two functions f and L.
3. Linearization of the State Transition with Unnormalized Probabilities. In this section, we simplify the highly nonlinear evolution equation (5) of π n by constructing a Zakai-type evolution equation (Zakai [13]) for the unnormalized probabilities. We define the unnormalized probabilities recursively. Starting with This state equation is linear in ρ. We also define a normalization factor λ n in each period: Since π n integrates to 1 while ρ n integrates to λ n , π n (resp. ρ n ) is called the normalized (resp. unnormalized) probability. The normalization factor λ n allows us to obtain π n from ρ n and vice versa. Indeed, π n (θ) = ρ n (θ)/λ n . Furthermore, we define a new value function Note that W has unnormalized probability ρ as an argument.
We have defined ρ n in (9) without any reference to π n in (5), although the equations (9) and (5) look similar. Based on this similarity, the next theorem relates ρ n and π n . It also provides a DP equation for W , which does not involve V . From this equation, W can be found out directly. The proof of the theorem is presented in the appendix.
Theorem 3.1. i) Each unnormalized probability ρ n in (9) is related to the normalized probability π n in (5) uniquely by ii) W given by (11) solves the DP equation iii) The solution y of (8) and the solution y of (13) are the same provided that π in (8) and ρ in (13) It is worth comparing the DP equations (8) and (13). A glance at these equations reveals that (13) is simpler and shorter than (8). Moreover, the update of ρ in (9) is linear while the update of π in (5) is nonlinear. As a result, (13) does not have a denominator which involves the control variable y. Thus, the derivative of (13) with respect to y is easier to obtain than the derivative of (8). Hence, finding the minimizer y is easier with (13) than it is with (8).
Theorem 3.1.iii) states that (13) and (8) are equivalent as far as their solutions are concerned. Thus, the optimal inventory levels which minimize (4) can be found by solving (13). This is a very important result as it enables us to work with the easy DP equation (13) by ignoring the difficult DP equation (8) in the remainder of this paper. A major contribution of this paper is in developing a simpler but equivalent DP framework with the unnormalized probabilities. This significantly simplifies the forthcoming proofs as well as the computation of optimal order quantities.
The simplicity of (13) is due to working with unnormalized probabilities as opposed to normalized probabilities. We expect that the concept of unnormalized probabilities would simplify the analysis of other partially observed inventory models. Consequently, unnormalized probabilities could have far reaching applications than we present in this paper.

Existence and Comparison
Results. The simplicity of (13) facilitates the proof of the existence of a solution W and of a feedback solution y * (ρ) that minimizes the right-hand side of (13). These are the subject of Theorem 4.1 below. The proof of this theorem is in the appendix. Briefly speaking, the proof works with the maps T y (W ) and T (W ) defined as We first show that map T (W ) is a contraction, and as a consequence, we obtain Theorem 4.1.i. Then, in addition, we establish lower semicontinuity of W , the unique solution of (13), and we construct an upper bound for optimal y. These yield Theorem 4.1.ii.
Theorem 4.1. i) There exists a unique value function W (·) which solves the DP equation (13).
ii) There exists an optimal feedback solution y * (ρ).
An important issue is to compare the myopic order quantity to the optimal order quantity that can be found from the DP equation (13). The claim of Ding et al. [8] is that the myopic order quantity is smaller than or equal to the optimal one. To formally compare the optimal and myopic solutions, let y M (ρ) = arg min where we have replaced "inf" with "min" in view of Theorem 4.1 and where y * (ρ) and y M (ρ) denote the optimal and myopic inventory levels, respectively. The main claim in Ding et al. [8] is y M (ρ) ≤ y * (ρ). Their proof was found to be erroneous by Lu et al. [10], when there are more than two periods in the horizon. Note that y * (ρ) in (14)  For the same claim, Lu, Song and Zhu [11] provide a proof. Their sample-path proof is based on a key observation that the derivative of the value function can be interpreted as the value function of some policy, which is not optimal. While this observation completes the proof, the identification of the policy which appears in the derivative is itself quite complex. Like Ding et al., Lu, Song and Zhu [11] do not use unnormalized probabilities either. For all these reasons, the proof in Lu et al. [11] is also intricate. Moreover, one does not see the underlying reason for the key observation. Therefore, the generalizations of their proof to other models is not obvious.
Instead, one can use unnormalized probabilities to provide a much simpler proof of the claim. The next theorem is devoted to this claim.
Theorem 4.2. The optimal inventory level is greater than or equal to the myopic inventory level, i.e., y * (ρ) ≥ y M (ρ) for every ρ.
In this paper, we choose to work with an infinite horizon problem to keep the notation brief and to emphasize the concept of unnormalized probabilities. However, the proof of Theorem 4.2 uses iteration W n+1 = T (W n ). The value function W n can be interpreted as the optimal cost to-go when there are n periods until the end of the horizon. Then our proof immediately implies that the claim of Theorem 4.2 holds also in finite horizon settings. 5. Concluding Remarks. In this paper, we revisit the censored demand model of Ding et al. [8] with unnormalized probabilities. These probabilities result in a simple but equivalent DP equation -to the one presented by Ding et al. The simplicity of our DP equation has facilitated the arguments for the existence of an optimal feedback policy. In addition on account of unnormalized probabilities, it results in a short and simple proof of the claim made by Ding et al. that the optimal order quantity is larger than or equal to the myopic order quantity. This provides a closure to the stream of literature beginning with Ding et al. and dealing with comparison of optimal and myopic policies. Moreover, our simple proof is easily transferable to other situations. For example, with censored Markovian demands in Bensoussan et al. [3], we obtained an analogous result.
Unnormalized probabilities are also shown to be useful in proving the sufficiency of some statistics when demands are exponentially distributed. Namely, the number of stock-out periods and the cumulative sales by period n are shown to be sufficient to describe the distribution of the parameter θ at the end of period n. This sufficiency in turn facilitates the proof of consistency of the Bayesian posteriors π n , i.e, the posteriors converge to the true parameter value θ 0 in the limit, regardless of the chosen prior. Convergence both of Bayesian posteriors and of posteriors obtained by an adaptive control scheme are presented in Bensoussan et al. [5]. This consistency result frees the inventory manager from worrying about selecting an appropriate prior, which in most cases is a guess, and lets him focus on inventory management.
The unnormalized probabilities were introduced by Bensoussan, Ç akanyıldırım and Sethi [2,3,4] in the context of partially observed inventories. While they were first developed by Zakai [13] in the context of optimal filtering problems involving Wiener processes, they do not directly apply to inventory problems under consideration. The unnormalized probability approach considerably simplifies the DP formulation and in general leads to relatively simpler proofs. The approach is useful for treating a large variety of operations research problems with partial observations.
ii) We start by inserting (8) into (11): where the last equality is obtained by using (11) and replacing V with W . The last equality is exactly (13), so ii) is proved.
iii) We note that (16) is the same as the right-hand side of (8), while (17) is the same as the right-hand side of (13). Then iii) follows immediately.
Proof of Theorem 4.1: We need to define functional spaces and appropriate norms in these spaces. Let where L 1 ( + ) is the space of integrable functions. We are ultimately interested in unnormalized probabilities, which are always nonnegative. Accommodate these in H + := {ρ ∈ H|ρ ≥ 0}, where we note that H + is a closed subset of H with the norm We define the space B of functions φ: We prove the existence of a feedback ordering policy under the assumption that the mean demand is finite; see (1).
i) We first establish that T is a contraction mapping: where T y (W ) equals the terms inside the curly brackets on the right-hand side of (13). Then it follows that We now examine the terms inside the curly brackets on the right-hand side of (20): Inserting the last equality into (20), we have |T (W )(ρ) − T (W )(ρ)| ≤ α||W − W || B ||ρ||. Dividing both sides by ||ρ||, we arrive at the next inequality, which proves the contraction property.
Since T is a contraction mapping, the iteration W 0 (·) = 0 and W n+1 (ρ) = T (W n )(ρ) converges to the solution of the DP equation (13). Moreover, this solution is unique. ii) Existence of Optimal Solution: We first require two claims. Claim 1: The solution W (·) of the DP equation (13) is lower semicontinuous over H + .
Since the iteration W 0 (·) = 0 and W n+1 (ρ) = T (W n )(ρ) converges to the solution of the DP equation (13), it suffices to argue that the sequence W n is increasing and each W n is continuous. The increasing property follows easily from inspecting the operator T . The continuity property can be proved by induction on n. We suppose that W n (·) is continuous and establish the same for W n+1 (·). Note that W 0 (·) is trivially continuous.
Since ρ n (·)f (x|·) →ρ(·)f (x|·), ρ n (·)F (y|·) →ρ(·)F (y|·) and W n is continuous, the right-hand side of (21) can be made arbitrarily small as ρ n →ρ. That is W n+1 is also continuous. This completes the inductive argument for the continuity of W n . Claim 2: An order quantity cannot be optimal unless it is less than or equal tō To prove this claim, we show that the solution W (·) of the DP equation (13) satisfies ||W || B ≤ bM/(1 − α). We first observe that Since y = 0 is a feasible solution for (13) and L(x, 0) ≤ bx by (22), we obtain where the second inequality follows from the finite mean demand assumption. Dividing both sides of (23) by ||ρ|| and taking the supremum over ρ, we obtain On the other hand, a simple lower bound on the cost of ordering y can be deduced by using (22) and ignoring the future costs as Combining (24) and (25), we arrive at which establishes Claim 2.
As a consequence of the lower semicontinuity of W in Claim 1, we obtain the lower semicontinuity of the map y → T y (W )(ρ). Moreover, the optimal order quantity belongs to the interval [0,ȳ] by Claim 2. Thus, inf y T y (W )(ρ) must be attained by an optimal y * ∈ [0,ȳ]. After taking the derivative of the integral, we arrive at an equivalent inequality d dy W F (y|·)ρ(·) ≤ −W (f (y|·)ρ(·)) .
In this proof, we write y * (ρ; W ) when we want to make explicit the value function appearing on the right-hand side of (14). We consider the iteration W 0 (ρ) = 0 and W n+1 (ρ) = T (W n )(ρ). Since T is a contraction mapping, W n converges to the solution of (13). Therefore we can state and inductively prove (26) in terms of W n+1 knowing that (27) holds trivially for W 0 (ρ) = 0.