Optimal Control of Brownian Inventory Models with Convex Holding Cost: Average Cost Case

We consider an inventory system in which inventory level fluctuates as a Brownian motion in the absence of control. The inventory continuously accumulates cost at a rate that is a general convex function of the inventory level, which can be negative when there is a backlog. At any time, the inventory level can be adjusted by a positive or negative amount, which incurs a fixed cost and a proportional cost. The challenge is to find an adjustment policy that balances the holding cost and adjustment cost to minimize the long-run average cost. When both upward and downward fixed costs are positive, our model is an impulse control problem. When both fixed costs are zero, our model is a singular or instantaneous control problem. For the impulse control problem, we prove that a four-parameter control band policy is optimal among all feasible policies. For the singular control problem, we prove that a two-parameter control band policy is optimal. We use a lower-bound approach, widely known as"the verification theorem", to prove the optimality of a control band policy for both the impulse and singular control problems. Our major contribution is to prove the existence of a"smooth"solution to the free boundary problem under some mild assumptions on the holding cost function. The existence proof leads naturally to numerical algorithms to compute the optimal control band parameters. We demonstrate that the lower-bound approach also works for Brownian inventory model in which no inventory backlog is allowed. In a companion paper, we will show how the lower-bound approach can be adapted to study a Brownian inventory model under a discounted cost criterion.


Introduction
This paper is concerned with optimal control of Brownian inventory models under the long-run average cost criterion. It serves two purposes. First, it provides a tutorial on the powerful lower-bound approach, known as the "the verification theorem", to proving the optimality of a control band policy among all feasible policies. The tutorial is rigorous and, except the standard Itô formula, self contained. Second, it contributes to the literature by proving the existence of a "smooth" solution to the free boundary problem with a general convex holding cost function. The existence proof leads naturally algorithms to compute the optimal control band parameters. The companion paper [14] studies the optimal control of Brownian inventory models under a discounted cost criterion.

The Model Description
In this paper and the companion paper [14], the inventory netput process is assumed to follow a Brownian motion with drift µ and variance σ 2 . The netput process captures the difference between regular supplies, possibly through a long term contract, and customer demands. Controls are exercised on the netput process to keep the inventory at desired positions. The controlled process, denoted by Z = {Z(t), t ≥ 0}, is called the inventory process in this paper. For each time t ≥ 0, Z(t) is interpreted as the inventory level at time t although Z(t) can be negative, in which case |Z(t)| represents the inventory backlog at time t. We assume that the holding cost function h : R → R + is a general convex function. Thus, t 0 h(Z(s))ds is the cumulative inventory cost by time t. Inventory position is assumed to be adjustable, either upward or downward. All adjustments are realized immediately without any leadtime delay. Each upward adjustment with amount ξ > 0 incurs a cost K + kξ, where K ≥ 0 and k > 0 are the fixed cost and the variable cost, respectively, for each upward adjustment. Similarly, each downward adjustment with amount ξ incurs a cost of L + ℓξ with fixed cost L ≥ 0 and variable cost ℓ > 0. The objective is to find some control policy that balances the inventory cost and the adjustment cost so that the long-run average total cost is minimized.
In describing our Brownian control problems, we have used the inventory terminology in supply chain management. One could describe such control problems in cash flow management. In this case, Z(t) represents the cash amount at time t ≥ 0. There are a large number of papers in the economics literature that have studied the Brownian control problems (e.g. Dixit [15]). Readers are referred to Stokey [29] and the references there for a variety of economic applications of Brownian control problems. While the discounted cost criterion is appropriate for cash flow management, the long-run average cost criterion is natural for many production/inventory problems.
When both fixed costs K and L are positive, it is clear that non-trivial feasible control policies should limit the number of adjustments to be finite within any finite time interval. Under such a control policy, inventory is adjusted at a sequence of discrete times and the resulting control problem is termed as the impulse control of a Brownian motion. When both fixed costs K = 0 and L = 0, it can be advantageous for the system to make an "infinitesimal" amount of adjustment at any moment. Indeed, as it will be shown in Section 6, an optimal policy will make an uncountable number of adjustments within a finite time interval. The resulting control problem is termed as the singular or instantaneous control of a Brownian motion. In this paper, we treat impulse and singular control of a Brownian motion in a single framework. Conceptually, one may view the singular control problem as a limit of a sequence of impulse control problems as fixed costs K ↓ 0 and L ↓ 0. Such a connection between impulse and singular control problems allow us to solve a mixed impulse-singular control problem (for example, K > 0 and L = 0) without much additional effort.

Non-Linear Holding Cost
When the holding cost function h is given by for some constants p > 0 and c > 0, we call h in (1.1) a linear holding cost function, even though h(x) in (1.1) is piecewise linear in inventory level x. With this holding cost function, inventory backlog cost is linear and inventory excess cost is also linear, but h(x) is not differentiable at x = 0. Although many papers focused on linear holding cost function (e.g. [19]), there are ample applications that motivate non-linear holding cost function. For example, [10] and [25] studied optimal index tracking of a benchmark index when there are transaction costs. An impulse control problem with quadratic holding cost arises naturally in their studies. Quadratic holding cost and general convex holding cost also arise in economic papers; see, for example, [9,24,32].

Optimal Policy Structure
For an impulse Brownian control problem under the long-run average cost criterion, we prove in Section 5 that a control band policy ϕ = {d, D, U, u} is optimal among all feasible policies. Under the control band policy ϕ, an adjustment is placed so as to bring the inventory up to level D when the inventory level drops to level d and to bring the inventory down to level U when the inventory level rises to level u. For a singular Brownian control problem, we show in Section 6 that the optimal policy is a degenerate control band policy with two free parameters D = d and U = u. When the inventory level is restricted to be always nonnegative, we show in Section 7 that the optimal policy for an impulse Brownian control problem is again a control band policy. Depending on the holding cost function h, this control band policy sometimes, but not always, has only three free parameters D, U and u that need to be characterized with the lowest boundary d = 0. Although we will not explicitly study the mixed impulse-singular Brownian control problems, it is clear from our proofs that a degenerate control band policy with three parameters is optimal.
The Lower-Bound Approach and the Free Boundary Problem This paper promotes a three-step, lower bound approach to solving Brownian control problems under the long-run average cost criterion. In the first step, we prove Theorem 4. 1 showing that if there exist a constant γ and a "smooth" test function f that is defined on the entire real line (or the positive half line when inventory is not allowed to be backlogged) such that f and γ jointly satisfy some differential inequalities, then the long-run average cost under any feasible policy is at least γ. This theorem is formulated and proved for all impulse, singular and mixed impulse-singular control problems. In the second step, we show in Theorems 5.1 and 6.1 that for a given control band policy, its long-run average cost can be computed as a solution to a Poisson equation. This equation is a second order ordinary differential equation (ODE) with given boundary conditions at the boundary points of the band. As a part of the solution to the Poisson equation, we also obtain the relative value function. The relative value function can naturally be extended to the entire real line, but the extended function may not be continuously differential at the boundary points of the control band. In the third step, we search for a control band policy such that the corresponding relative value function can indeed be extended smoothly as a function f on the entire real line. Furthermore, this smooth function f , together with the long-run average cost under the control band policy, satisfies the differential inequalities in step 1 within the entire real line. Clearly, if the control band policy in step 3 can be found, it must be an optimal policy by Theorem 4.1. The lower-bound theorem, Theorem 4.1, is known as the "verification theorem" in literature.
Step 3 is the most critical step in the three-step approach. In order to make the relative value function smoothly extendible to the entire real line, the parameters of the control band must be carefully selected. These parameters serve as the boundary points of the ODE and they themselves need to be determined. The smoothness requirements impose conditions of the ODE solution at these yet to be found boundary points. Thus, the ODE in step 3 is known as the free boundary ODE problem. Solving the free boundary problem to find the optimal parameters is also known as the "smooth pasting" method [9]. Solving a free boundary problem is often technically difficult. The number of free parameters of an optimal control band policy dictates the level of difficulty in solving the free boundary problem. Many papers in the literature left it unsolved (e.g. [15,27]), assuming there is a solution to the free boundary problem with a certain smoothness property.

Contributions
The Brownian inventory control problem is now a classical problem, starting from Bather [3] thirty five years ago. We will survey the research area in the next several paragraphs. In addition to providing a self contained tutorial on the lower-bound approach to studying optimal control problems, our paper contributes significantly in the following areas. (a) Under a general convex holding cost function with some minor assumptions, we rigorously prove the existence of a control band policy that is optimal for both the impulse and singular control problems under the long-run average cost criterion. (b) Under the general convex holding cost function, we have proved the existence of a solution to the four-parameter free-boundary problem. Our existence proof leads naturally to algorithms for computing optimal control band parameters. These algorithms reduce to root findings for continuous, monotone functions. Thus, the convergence of these algorithms are guaranteed. We are not aware of any paper that proved the existence of a solution to the four-parameter free boundary problem under the long-run average cost criterion. In the discounted setting, [13] solved the four-parameter free boundary problem when h is linear, and [1] solved the problem when h is quadratic. Recently, Feng and Muthuraman [17] developed an algorithm to numerically solve the four-parameter free boundary problem for the discounted Brownian control problem. They illustrate the convergence of their algorithm through some numerical examples. However, the convergence of their algorithm was not established. (c) Under the long-run average cost criterion, our lower-bound approach provides a unified treatment for both the impulse and singular control problems, with and without inventory backlog. In particular, we do not need to employ vanishing discount approach [16,21,28] to study the long-run average cost problems. In her book, Stokey [29] summarizes both the impulse and instantaneous controls of Brownian motion with a general convex holding cost function. She focused on the discounted cost problems, and employed the vanishing discount approach to deal with the long-run average cost problems. It is appealing that our current paper studies the long-run average cost problem directly, and characterizes the optimal parameters directly without going through the vanishing discount procedure.

Literature Review
The lower-bound approach were used in [23,31] under a long-run average cost criterion and in [19,20] under a discounted cost criterion. The approach is essentially the same as the quasi-variational inequality (QVI) approach that was pioneered by Bensoussan and Lions [6]. The QVI approach was systematically developed in a French book that was later translated into English (see [7]). An appealing feature of the QVI approach is that it is sufficient to solve a QVI problem in order to obtain an optimal policy for an inventory control problem, and this sufficiency is established in [7]. The QVI problem is a pure analytical problem that is closely related the free boundary problem. Many authors directly start with the QVI problems, relying on the "verification theorem" developed in [7]; see for example, [4,5,8,30]. The potential drawback of this approach is that when the formulation of a Brownian control problem is slightly different from the setting in [7], one may have to developed a new verification theorem, presumably mimicking the development in the book. In contrast, our lower-bound approach allows us to provide a self-contained, rigorous proof simultaneously for impulse, singular and mixed control problems. It also allows one to directly see how the smooth requirement of a solution to the free-boundary problem is used. We believe the lower-bound approach is easier to be generalized to high dimensional Brownian control problems.
The impulse control problem with both upward and downward adjustments was studied as early as 1976 and 1978 in two papers by Constantinides [12] and Constantinides et al. [13]. The first paper studies the long-run average cost objective and the second paper studies the discounted cost objective. Both papers assume the holding cost function is linear as given in (1.1). Under this holding cost function, the optimal control band parameters can be explicitly characterized. Baccarin [1] studies discounted impulse Brownian control problem with quadratic inventory cost function. When the inventory is restricted to be nonnegative, but still under the linear holding cost assumption (1.1), Harrison et al. [19] studies the discounted cost, impulse Brownian control problem whereas Ormeci et al. [23] studies the long-run average cost problem. Under the linear holding cost function assumption, the optimal policy is a degenerate control band policy {0, D, U, u}, where three optimal parameters D, U, u can be determined explicitly. However, under our general convex holding cost assumption, the optimal policy for the impulse control problem without inventory backlog is again a control band policy {d, D, U, u}, with d sometimes being strictly positive. Harrison and Taksar [20] and Taksar [31] study the singular Brownian control problem under a general convex inventory cost function assumption. The former paper studies the discounted cost problem and the latter studies the long-run average cost problem. Taksar [31] characterizes the optimal control band parameters through the optimal stopping time to a stochastic game without solving the two-parameter free boundary problem. As in [31], Stokey [29] characterizes her optimal parameters through a stopping time problem without solving the four-parameter free boundary problem. These stopping time characterizations do not easily lead to any numerical algorithm to compute two optimal parameters. Richard [27] studies an impulse control of a general one-dimensional diffusion process. He assumes without proof the existence of a solution to a quasi-variational inequality problem with certain regularity property in order to characterize an optimal policy. Kumar and Muthuraman [22] develop a numerical algorithm to solve high-dimensional singular control problems. Vickson [33] studies a cycling problem with Brownian motion demand.
In his pioneering paper, Bather [3] studies the impulse Brownian motion control problem without downward adjustment, under the long-run average cost criterion. For most inventory problems, without downward adjustment is a natural setting. Under a general holding cost function, he suggests that an (s, S) policy is optimal and derives equations that characterize the optimal parameters s and S. Many authors have generalized this paper to various settings, to discounted cost problems with linear holding cost in [30], to discounted cost problems with and without inventory backlog in [11], to discounted cost problems under the general convex holding cost function assumption in [4], to discounted cost problems with positive constant leadtime in [2], to compound Poisson and diffusion demand processes in [5,8]. Because there is no downward adjustment in these problems, the optimal policy has two parameters and the resulting two-parameter free boundary problem can be solved much easier than the four-parameter one.

Paper Organization
The rest of this paper is organized as follows. In Section 2, we define our Brownian control problem in a unified setting that includes impulse, singular and mixed impulse-singular controls. In Section 3 we present a version of Itô formula that does not require the test function f be C 2 function. A lower bound for all feasible policies is established in Section 4. Section 5 devotes to impulse control problems that allow inventory backlog under the long-run average cost criterion. Section 5.1 shows that under a control band policy, a Poisson equation can produce a solution that gives both the long-run average cost and the corresponding relative value function. Under the assumption that a free-boundary problem has a unique solution that has desired regularity properties, Section 5.2 proves that there is a control band policy whose long-run average cost achieves the lower bound. Thus, the control band policy is optimal among all feasible policies. Section 5.3 is a lengthy one that devotes to the existence proof of the solution to the free-boundary problem. In the section, the parameters for the optimal control band policy are characterized. Section 5.3 constitutes the main technical contribution of this paper. Section 6 solves the singular control problem. This section is short, essentially becoming a special case of Section 5 when both K = 0 and L = 0. Section 7 deals with impulse control problems when inventory is not allowed backlogged. Finally, Section 8 summarizes this paper and discusses a few extensions.

Brownian Control Models
Let X = {X(t), t ≥ 0} be a Brownian motion with drift µ and variance σ 2 , starting from x. Then, X has the following representation where W = {W (t), t ≥ 0} is a standard Brownian motion that has drift 0, variance 1, starting from 0. We assume W is defined on some filtered probability space (Ω, {F t }, F, P) and W is an {F t }-martingale. Thus, W is also known as an {F t }-standard Brownian motion. We use X to model the netput process of the firm. For each t ≥ 0, X(t) represents the inventory level at time t if no control has been exercised by time t. The netput process will be controlled and the actual inventory level at time t, after controls has been exercised, is denoted by Z(t). The controlled process is denoted by Z = {Z(t), t ≥ 0}. With a slight abuse of terminology, we call Z(t) the inventory level at time t, although when Z(t) < 0, |Z(t)| is the backorder level at time t.
Controls are dictated by a policy. A policy ϕ is a pair of stochastic processes (Y 1 , Y 2 ) that satisfies the following three properties: (a) for each sample path ω ∈ Ω, Y i (ω, ·) ∈ D, where D is the set of functions on R + = [0, ∞) that are right continuous on [0, ∞) and have left limits in (0, ∞), (b) for each ω, Y i (ω, ·) is a nondecreasing function, (c) Y i is adapted to the filtration {F t }, namely, Y i (t) is F t -measurable for each t ≥ 0. We call Y 1 (t) and Y 2 (t) the cumulative upward and downward adjustment, respectively, of the inventory in [0, t]. Under a given policy (Y 1 , Y 2 ), the inventory level at time t is given by Therefore, Z is a semimartingale, namely, a martingale σW plus a process that is of bounded variation. In general, we allow an upward or downward adjustment at time t = 0. By convention, we set Z(0−) = x and call Z(0−) the initial inventory level. By (2.1), which can be different from the initial inventory level Z(0−).
There are two types of costs associated with a control. They are fixed costs and proportional costs. We assume that each upward adjustment incurs a fixed cost of K ≥ 0 and each downward adjustment incurs a fixed cost of L ≥ 0. In addition, each unit of upward adjustment incurs a proportional cost of k > 0 and each unit of downward adjustment incurs a proportional cost of ℓ > 0. Thus, by time t, the system incurs the cumulative proportional cost kY 1 (t) for upward adjustment and the cumulative proportional cost ℓY 2 (t) for downward adjustment. When K > 0, we are only interested in policies such that N 1 (t) < ∞ for each t > 0; otherwise, the total cost would be infinite in the time interval [0, t]. Thus, when K > 0, we restrict upward controls that have a finitely many upward adjustment in a finite interval. This is equivalent to requiring Y 1 to be a piecewise constant function on each sample path. Under such an upward control, the upward adjustment times can be listed as a discrete sequence {T 1 (n) : n ≥ 0}, where the nth upward adjustment time can be defined recursively via where, by convention, T 1 (0) = 0 and ∆Y 1 (t) = Y 1 (t) − Y 1 (t−). The amount of the nth upward adjustment is denoted by It is clear that specifying such a upward adjustment policy Y 1 = {Y 1 (t), t ≥ 0} is equivalent to specifying a sequence of {(T 1 (n), ξ 1 (n)) : n ≥ 0}. In particular, given the sequence, one has (2.2) and N 1 (t) = max{n ≥ 0 : T 1 (n) ≤ t}. Thus, when K > 0, it is sufficient to specify the sequence {(T 1 (n), ξ 1 (n)) : n ≥ 0} to describe an upward adjustment policy. Similarly, when L > 0, it is sufficient to specify the sequence {(T 2 (n), ξ 2 (n)) : n ≥ 0} to describe a downward adjustment policy and Merging these two sequences, we have the sequence {(T n , ξ n ), n ≥ 0}, where T n is the nth adjustment time of the inventory and ξ n is the amount of adjustment at time T n . When ξ n > 0, the nth adjustment is an upward adjustment and when ξ n < 0, the nth adjustment is a downward adjustment. The policy In addition to the adjustment cost, the system is assumed to incur the holding cost at rate h(x): when the inventory level is at Z(t) = x, the system incurs a cost of h(x) per unit of time. Therefore, the cumulative holding cost in where E x is the expectation operator conditioning the initial inventory level Z(0−) = x.
As mentioned earlier, when K > 0 and L > 0, it is sufficient to restrict feasible policies to be impulse type given in (2.2) and (2.3). Such a Brownian inventory control model is called the impulse Brownian control model. When K = 0 and L = 0, it turns out the under an optimal policy, N 1 (t) = ∞ and N 2 (t) = ∞ with positive probability for each t > 0. The corresponding control problem is called the instantaneous Brownian control model or singular Brownian control model. In this paper, we make the following assumption on the holding cost function h : R → R + . Assumption 1. Assume that the continuous holding cost function h : R → R + satisfies the following conditions: (a) it is convex; (b) there exists an a such that h ∈ C 2 (R) except at a, and h(a We only consider feasible policies that satisfy In some applications, one might require inventory level be nonnegative always, namely,

The Itô Formula
In this section, we first state a version of Itô's formula. We then provide a lower bound result for the long-run average cost in (2.4). Recall that for a function g ∈ D, it is right continuous on [0, ∞) and has left limits in (0, ∞). We use g c to denote the continuous part of g, namely, Here we assume g(0−) is well defined. Recall under any feasible policy ϕ = (Y 1 , Y 2 ), the inventory process Z = {Z(t) : t ≥ 0} has the semimartingale representation (2.1). Because Brownian motion has continuous sample paths, we have where is the generator of the (µ, σ 2 )-Brownian motion X and t 0 f ′ (Z(s))dW (s) is interpreted as the Itô integral.
Remark. Although f ′′ (u) is only defined on almost all u in R, t 0 f ′′ (Z(s))ds is uniquely defined almost surely. Indeed, where L a is the local time of Z at a. and for t ≥ 0. Because Y 1 and Y 2 have at most countably many jump points, Z has at most countably many discontinuity points. Therefore, we have

Lower Bound
In this section, we state and prove a theorem that establishes a lower bound for the optimal long-run average cost. This theorem is closely related to the "verification theorem" in literature. Its proof is self contained, using the Itô lemma in Section 3.
Then AC(x, ϕ) ≥ γ for each feasible policy ϕ and each initial state x ∈ R.

Remark. (i) When
Because under an arbitrary control policy, the inventory level Z can potentially reach any level. Thus, we require function f to be defined on the entire real line R. It is not enough to have f defined on a certain interval [d, u].
where the inequality is due to (4.1). In the rest of the proof, we separate into different cases depending on the positivity of K and L. We will provide a complete proof for the case when K > 0 and L > 0. Sketches will be provided for proofs in other cases.
Case I: Assume that K > 0 and L > 0. In this case, it is sufficient to restrict feasible policies to impulse control policies {(T n , ξ n ) : n = 0, 1, . . .}. In this case, Y c 1 = 0 and Y c 2 = 0. Conditions (4.2) and (4.3) imply that and ∆f (Z(T (n))) ≥ −φ(ξ n ) for n = 0, 1, . . ., where for each t ≥ 0. Fix an x ∈ R. We assume that Taking E x on the both sides of (4.5), we have Dividing both sides by t and taking limit as t → ∞, one has lim inf We consider two cases. In the first case when lim inf it is clear that (4.6) implies the theorem. Now we consider the case when lim inf It follows that for sufficiently large t, Because |f ′ (y)| ≤ M , for all y ∈ R, Therefore, which, together with (4.7), implies that for sufficiently large t. This implies that lim inf which implies that AC(x, ϕ) = ∞, thus proving the theorem. To see (4.9), by the Assumption (a) and (c), there exist constants h 1 > 0 and c > 0 such that Because of (4.8), one of the following two equations holds: Assume that (4.12) holds. Condition (4.10) implies that It follows that lim inf which proves (4.9). Hence the theorem is proved for K > 0 and L > 0.
Case II: Assume that K = 0 and L = 0. Condition (4.2) leads to f ′ (u) ≥ −k for all u ∈ R and condition (4 Thus, the last three terms in (4.4) is at least Therefore, (4.4) leads to for t ≥ 0. The rest of the proof is identical to the case when K > 0 and L > 0. Case III: Assume K > 0 and L = 0. Consider a feasible policy (Y 1 , Y 2 ) with a finite cost. The upward controls must be impulse controls and Y 1 (t) = Therefore, (4.4) leads to for t ≥ 0. The rest of the proof is identical to the case when K > 0 and L > 0.
Case IV: Assume that K = 0 and L > 0. This case is analogous to the case when K > 0 and L = 0. Thus, the proof is omitted.

Impulse Controls
In this section, we assume that K > 0 and L > 0. Therefore, we restrict our feasible policies to impulse controls as in (2.2) and (2.3). An impulse control band policy is defined by four parameters d, D, U , u, where d < D < U < u. Under the policy, when the inventory falls to d, the system instantaneously orders items to bring it to level D; when the inventory rises to u, the system adjusts its inventory to bring it down to U . Given a control band policy ϕ, in Section 5.1 we provide a method for performance evaluation. As a byproduct, we also obtain the relative value function associated with the control band policy. Then in Section 5.2 we show that an optimal policy is a control band policy and present equations that uniquely determine the optimal control band parameters (d * , D * , U * , u * ).

Control Band Policies
We use {d, D, U, u} to denote the control band policy associated with parameters d, D, U , u. Let us fix a control band policy ϕ = {d, D, U, u} and an initial inventory level Z(0−) = x. The adjustment amount ξ n of the control band policy is given by and for n = 1, 2, ..., where again Z(t−) denotes the left limit at time t, T 0 = 0 and is the nth adjustment time. (By convention, we assume Z is right continuous having left limits.) Our first task is to find its long-run average cost. We first present the following theorem.
with boundary conditions . Boundary conditions (5.2) and (5.3) imply that Dividing both sides by t and letting t → ∞, we have AC(x, ϕ) = γ because Let m ∈ R be any fixed number. Define Then (V, γ) is a solution to (5.1)-(5.3). In (5.6) and (5.7), we set Using the coefficients defined in (5.8)-(5.10), we see the boundary conditions (5.11) and (5.12) become from which we have unique solution for γ and V ′ (m) given in (5.6) and (5.7).

Optimal Policy and Optimal Parameters
Theorem 4.1 suggests the following strategy to obtain an optimal policy. We hope that a control band policy is optimal. Therefore, the first task is to find an optimal policy among all control band policies. We denote this optimal control band policy by ϕ * = {d * , D * , U * , u * } with long-run average cost γ * . We hope that γ * can be used as the constant in (4.1) of Theorem 4.1. To find the corresponding f that, together with the γ * , satisfies all the conditions of Theorem 4.1, we start with the relative value function V (x) associated with the policy ϕ * . This relative value function V is defined on the finite interval [d * , u * ]. We need to extend V so that it is defined on the entire real line R. Given that V (x) is the relative value function, it is natural to extend it in the following way We are yet to determine the optimal parameters (d * , D * , U * , u * ). Now we provide an intuitive argument on the conditions that should be imposed on the optimal parameters. Since we wish f ∈ C 1 , we should have Also, starting from d * , the system should jump to a D that minimizes Therefore, at D = D * , k + V ′ (D) = 0, namely, Similarly, one should have V ′ (U * ) = ℓ. (5.16) In this section, we will first prove in Theorem 5.2 the existence of parameters d * , D * , U * and u * such that the relative value function V corresponding the control band policy ϕ = {d * , D * , U * , u * } satisfies (5.1)-(5.3), and (5.14)-(5.16). As part of the solution, we are to find the boundary points d * , D * , U * and u * from equations (5.1)-(5.3) and (5.14)-(5.16). These equations define a free boundary problem. The solution to a free boundary problem is much more difficult to be found than the one to a boundary value problem. We then prove in Theorem 5.3 that the extension f in (5.13) and γ * = AC(ϕ * , x) jointly satisfy all the conditions in Theorem 4.1; therefore, the control band policy ϕ * is optimal among all feasible policies.
To ease the presentation, in the rest of this section, we assume that µ > 0. The statement and analysis for the cases µ < 0 and µ = 0 are analogous and are omitted.
To facilitate the presentation of Theorem 5.2, we first find a general solution V to (5.1) without worrying about boundary conditions (5.2) and (5.3). Proposition 1 shows that such V is given in the form where g is given by (5.5) and m is some constant. Since the optimal boundary points d * , D * , U * , u * are yet to be determined, the constant γ on the right side of (5.1) is also yet to be determined. Differentiating both sides of (5.1) with respect to x, we have shown that is a solution to In (5.5), we fix m = a and set A = 2γ/(λσ 2 ) and To summarize, we have the following lemma. The following theorem characterizes optimal parameters (d * , D * , U * , u * ) via solution g = g A,B . Figure 1 depicts the function g used in the theorem.
Theorem 5.2. Assume that the holding cost function h satisfies Assumption 1. There exist unique A * , B * , d * , D * , U * and u * with Furthermore, g has a local minimum at x 1 < a and a local maximum at x 2 > a. The function g is decreasing on (−∞, x 1 ), increasing on (x 1 , x 2 ) and decreasing again on (x 2 , ∞).

Conditions (5.22) and (5.23) ensure thatḡ is C(R).
Define Let γ * be the long-run average cost under policy ϕ * . We now show that V and γ * satisfy all the conditions in Theorem 4.1. Thus, Theorem 4.1 shows that the long-run average cost under any feasible policy is at least γ * . Since γ * is the long-run average cost under the control band policy ϕ * , γ * is the optimal cost and the control band policy ϕ * is optimal among all feasible policies.
By Theorem 5.1, the constant must be the long-run average cost γ * under control band policy ϕ * . Now, we show that V (x) satisfies the rest of conditions in Theorem 4.1. Conditions (5.22) and (5.23) imply that truncated functionḡ is continuous in R. Therefore, V ∈ C 1 (R).

Optimal Control Band Parameters
This section is devoted to the proof of Theorem 5.2. We separate the proof into a series of lemmas. Throughput of this section, we assume that µ > 0 and that the holding cost function h satisfies Assumption 1. Recall the λ defined in (5.4). Define Because h ′ (x) < 0 for x < a, B > 0. For A, B ∈ R, recall the function g A,B defined in (5.19). We sometime use the fact that When the context is clear, we simply use g to denote g A,B . For the following lemma, readers are referred to Figure 1.
Proof. Differentiating g(x) = g A,B (x) in (5.19) and noting h(a) = 0, we have where, for x ∈ R, and h ′ (x) < 0 for x < a and h ′ (x) > 0 for x > a, we have that F 1 (B, x) increases in x < a and decreases in x > a. For B > 0, we have Therefore, for any B > 0, there exists a unique x 2 = x 2 (B) ∈ (a, ∞) such that F 1 (B, x 2 ) = 0 or equivalently g ′ (x 2 ) = 0. For B ∈ (0, B), it is clear that Thus the lemma is proved.
Remark. The local maximizer x 2 (B) is well defined for all B ∈ (0, ∞), whereas the local minimizer x 1 (B) is defined only for B ∈ (0, B).  ∈ (0, B), Proof. (a) Recall the function F 1 defined in (5.29). Obviously, F 1 , ∂F 1 ∂B , ∂F 1 ∂x are continuous, and ∂F 1 ∂x is given in (5.30). One has where we have used the fact that h ′ (x) < 0 for x ∈ (−∞, a). Using the Implicit Function Theorem, x 1 (B) is continuously differential in B ∈ (0, B), and Thus, x 1 (B) is strictly decreasing in B ∈ (0, B). Similarly, we have proving that x 2 (B) continuously differential and strictly increasing in B ∈ (0, ∞). The limits in (5.31) and (5.32) can be proved easily following the definition of x 1 (B) and x 2 (B).

Thus, lim
It follows that lim where we have used the fact that Lemma 5.6 and the inequality in (5.54) immediately imply the following lemma Finally, we prove the following lemma, which in turn proves Theorem 5.2.
Lemma 5.9. There exist unique B * ∈ (B 2 , B), d * and D * that satisfy This in turn implies that It follows from (5.54) and Lemma 5.8 that This, together with the definition of A(B) in (5.38), shows that Therefore, we have lim It follows that lim It follows from (5.68) that M 1 > 0. Then for each B ∈ (B 3 , B), Therefore, for each B ∈ (B 3 , B) there exist unique d 1 (B) and D 1 (B) such that The properties of g in Lemma 5.2 (see also Figure 1) imply that for each B ∈ (B 3 , B) This, together with (5.32) implies that Note that for x ∈ (d(B), D(B)), g A * (B),B (x) < −k. Therefore, for B ∈ (B 3 , B), It follows from (5.69) and (5.63) that for each B ∈ (B 3 , B), Thus, for any B ∈ (B 3 , B), Thus, for any B ∈ (B 3 , B),

Singular Controls
In this section, we assume that K = 0 and L = 0. Therefore, we restrict our feasible policies to singular controls also known as instantaneous controls as in (2.2) and (2.3). A two-parameter control band policy is defined by two parameters d, u, where d < u. No control is exercised until the inventory level Z(t) reaches the lower boundary d or the upper boundary u. When Z(t) reaches a boundary, there is no advantage in using impulse control because there is no fixed cost.

Control Band Policies
Let us fix a two-parameter control band policy ϕ = {d, u}. To mathematically describe the control process (Y 1 , Y 2 ), we need to use two-sided regulator: for each x ∈ D with x(0) ∈ [d, u], find a triple (y 1 , y 2 , z) ∈ D 3 such that The precise mathematical meaning of (6.4) is One can verify that (6.5) is equivalent to the following: whenever z(t) > d for t ∈ [t 1 , t 2 ], y 1 (t 2 ) − y 1 (t 1 ) = 0 and whenever z(t) < u for t ∈ [t 1 , t 2 ], y 2 (t 2 ) − y 2 (t 1 ) = 0. Lemma 6.1 below follows from Proposition 6 in Section 2.4 of [18]. That proposition is stated for each continuous path x ∈ D; one can verify that the proposition continues to hold when the continuity of x is dropped.
The nondecreasing functions (y 1 , y 2 ) are said to be the two-sided regulator of x, and z is the regulated path of x. When either u = ∞ or d = −∞, the corresponding one-sided regular is defined in Section 2.2 of [18].
To find the long-run average cost under the policy ϕ = {d, u}, we use the following theorem.
with boundary conditions Then the average cost AC(x, ϕ) is independent of the initial inventory level x ∈ R and is given by γ in (6.6).
Proof. First we assume x ∈ [d, u]. In this case, Z(0) = x. By Itô's formula, Therefore Dividing both sides by t and taking the limit as t → ∞, we have AC(x, ϕ) = γ. Let m ∈ R be any fixed number. Define Then (V, γ) is a solution to (6.6)-(6.8). In (6.9) and (6.10), we set Proof. Similar to the proof of Proposition 1, equation (6.6) implies that Boundary conditions (6.7) and (6.7) become Using the coefficients defined in (6.11)-(6.13), we see the boundary conditions (6.14) and (6.15) become from which we have unique solution for γ and V ′ (m) given in (6.9) and (6.10).

Optimal Policy and Optimal Parameters
Theorem 4.1 suggests the following strategy to obtain an optimal policy. We hope the optimal policy is a control band policy. Therefore, the first task is to find an optimal control band policy among all control band policies. Denote this optimal control band policy by ϕ * = {d * , u * }, d * < u * , with long-run average cost γ * . We hope that γ * can be used the constant in (4.1) of Theorem 4.1. To find the corresponding f that satisfies all the conditions of Theorem 4.1, we start with the relative value function V (x) associated with the policy ϕ * that is defined on the interval [d * , u * ]. We need to extend V (x) so that it can be defined on R. Given that V (x) is the relative value function, it is natural to define Since we wish f ∈ C 1 (R), we should have We also hope f ∈ C 2 (R), we should have the following conditions, In this section, we will first prove the existence of parameters d * and u * such that the relative value function V corresponding the control band policy ϕ = {d * , u * } satisfies (6.6)-(6.8), and (6.17)-(6.18). Since part of the solution is to find the boundary points d * and u * , equations (6.6)-(6.8), and (6.17)-(6.18) define a free boundary problem. We then prove that the extension f in (6.16) and γ * = AC(ϕ * , x) jointly satisfy all the conditions in Theorem 4.1.
In the rest of this section, we assume that µ > 0. The statement and analysis for the cases µ < 0 and µ = 0 are analogous and are omitted. Recall the function g(x) = g A,B (x) defined in (5.19).
Then g(x) = g A * ,B 1 (x), d * = x 1 (B 1 ) and u * = x 2 (B 1 ) satisfy (6.19)-(6.22); see Figure 6 Now we show that the control band policy ϕ * = {d * , u * } is optimal policy among all feasible policies. Theorem 6.3. Assume that h satisfies Assumption 1. Let d * and u * , along with constants A * and B * , be the unique solution in Theorem 6.2. Then the control band policy ϕ * = {d * , u * } is optimal among all feasible policies.
Proof. Let g(x), x ∈ R, be the function in (5.19) with A = A * and B = B * . Let Let γ * be the long-run average cost under policy ϕ * . We now show that V and γ * satisfy all the conditions in Theorem 4.1. Thus, Theorem 4.1 shows that the long-run average cost under any policy is at least γ * . Therefore, γ * is the optimal cost and the control band policy ϕ * is an optimal policy. Now we check that V (x) is in C 2 (R) and satisfies (4.1)-(4.3).
To check (4.1), we first find that ΓV ( where the second equality is because for , the inequality is due to x < d * = x 1 < a, where a again is the minimum point of h. Similarly, for x > u * , ΓV (x) + h(x) ≥ γ * . Finally, (4.2) and (4.3) hold because g(x) is strictly increasing in x, x ∈ [d * , u * ], and g(d * ) = g(d * ) = −k, g(u * ) = g(u * ) = ℓ (See Figure 6.2). Thus, the optimality of control band policy ϕ * is implied by Theorem 4.1.

No Inventory Backlog
In this section, the inventory backlog is not allowed and thus we add the constraint Z(t) ≥ 0 for all t ≥ 0. The holding cost function h(·) is defined on [0, ∞), and a ∈ [0, ∞) is its minimum point. We focus on the impulse control case when K > 0 and L > 0. Thus, this section parallels Section 5. In particular, the results and proofs in this section are analogous to that in Section 5. In our presentation, we will highlight the differences.
For a control band policy {d, D, U, u} with 0 ≤ d < D < U < u, one can continue to use Theorem 5.1 to evaluate its performance and to obtain is relative value function. But the lower bound theorem, Theorem 4.1, needs to be slightly modified as in the following theorem.

Optimal Policy Parameters
Recall that for a given set of parameters {d, D, U, u} with 0 ≤ d < D < U < u, the corresponding relative value function satisfies (5.1)-(5.3). To search for the optimal parameters (d * , D * , U * , u * ), we impose the following conditions on {d, D, U, u} and V : In some cases, it is optimal to have d * = 0. In such a case, one only needs to solve for three parameters D * , U * and u * . This section is analogous to Section 5.2. We highlight the differences between these two sections and omit some details to avoid repetition. Recall that a is the minimum point of the holding cost function h(x) on [0, ∞). It is possible a = 0 or a > 0. In the following, whenever Assumption 1 is invoked for h, any condition on h(x) with x < 0 is ignored. Similar to Lemma 5.1, we have the following lemma.
The following theorem solves the free boundary problem when inventory backlog is not allowed. and such that the corresponding g( 14) Furthermore, g has a local minimum at x 1 ≤ a and g has the maximum at x 2 > a. The function g is decreasing on (0, x 1 ), increasing on (x 1 , x 2 ) and decreasing again on (x 2 , ∞).
We leave the proof of Theorem 7.2 to the end of this section.
Theorem 7.3. Assume that the holding cost function h satisfies conditions (a)-(d) of Assumption 1. Let 0 ≤ d * < D * < U * < u * , along with constants A * and B * , be the unique solution in Theorem 7.2. Then the control band policy ϕ * = {d * , D * , U * , u * } is optimal among all feasible policies to minimize the long-run average cost when inventory backlog is not allowed.
Proof. The proof is identical to that of Theorem 5.3.
The rest of this section is devoted to the proof for Theorem 7.2. This proof is similar to the proof of Theorem 5.2. We provide an outline of the proof for Theorem 7.2, highlighting differences between the two proofs. We only consider the case when µ > 0. Other cases are analogous and are omitted. Define The following lemma is analogs to Lemma 5.2. The only difference is that the expression for x 1 = x 1 (B) has two forms in Lemma 7.2. (c) For each B ∈ (0, ∞), g ′ A,B (x) < 0 for x ∈ (0, x 1 (B)), g ′ A,B (x) > 0 for x ∈ (x 1 (B), x 2 (B)), and g ′ A,B (x) < 0 for x ∈ (x 2 (B), ∞).
The following lemma is analogs to Lemma 5.3. Therefore, there exists a unique B 2 ∈ (B 1 , ∞) such that (5.54) holds.
Proof. The proof of this lemma is identical to the proof of Lemma 5.7 except that we need to prove (7.25).
To prove ( Next, it is easy to see that the limit (5.66) continues to hold as well. It remains to prove lim

Concluding Remarks
In this paper, we have given a tutorial of the lower-bound approach to studying the optimal control of Brownian inventory models with a general convex holding cost function. The control can be either impulse or singular, and the inventory can be either backlogged or without backlog. For future research, it would be interesting to study multi-stage inventory systems with Brownian motion demand. Yao [34] has done a preliminary study for these systems.