ASYMPTOTICALLY OPTIMAL DYNAMIC PRICING FOR NETWORK REVENUE MANAGEMENT

A dynamic pricing problem that arises in a revenue management context is considered, involving several resources and several demand classes, each of which uses a particular subset of the resources. The arrival rates of demand are determined by prices, which can be dynamically controlled. When a demand arrives, it pays the posted price for its class and consumes a quantity of each resource commensurate with its class. The time horizon is finite: at time T the demands cease, and a terminal reward (possibly negative) is received that depends on the unsold capacity of each resource. The problem is to choose a dynamic pricing policy to maximize the expected total reward. When viewed in diffusion scale, the problem gives rise to a diffusion control problem whose solution is a Brownian bridge on the time interval [0, T ]. We prove diffusion-scale asymptotic optimality of a dynamic pricing policy that mimics the behavior of the Brownian bridge. The ‘target point’ of the Brownian bridge is obtained as the solution of a finite dimensional optimization problem whose structure depends on the terminal reward. We show that, in an airline revenue management problem with no-shows and overbooking, under a realistic assumption on the resource usage of the classes, this finite dimensional optimization problem reduces to a set of newsvendor problems, one for each resource.

1. Introduction.In this paper we consider a dynamic pricing problem that arises in a revenue management context.The revenue management problem involves several resources, each with finite capacity.There are several demand classes, each of which uses a particular subset of the resources.(The interpretation of this subset of resources as a route motivates our description of this multiple resource setting as a network.)The demands arrive in independent Poisson processes whose rates are determined by prices, one for each class, that can be dynamically controlled.When a demand arrives, it pays the posted price for its class and consumes a quantity of each resource commensurate with its class.The time horizon is finite: at time T the demands cease, and a terminal reward (possibly negative) is received that depends on the unsold capacity of each resource.The problem is to choose a dynamic pricing policy to maximize the expected total reward (= total revenue from demand arrivals + terminal reward).This is a canonical problem in revenue management, and falls into the category of price-based revenue management as delineated by Talluri and van Ryzin [17].
This problem can be formulated as a Markov decision process, but even with a single resource the continuous time version has an uncountable state space because the remaining (or elapsed) time must be included in the state.There are some structural results available for the single resource case.For example, Zhao and Zheng [18] show that for a fixed time the optimal price decreases with remaining capacity.They also provide a sufficient condition under which the optimal price decreases over time for a given remaining capacity level.To the best of our knowledge there are no structural results for price-based revenue management in the network (multiple resource) setting.
Gallego and van Ryzin [6,7] consider dynamic pricing, with [6] restricted to a single resource and a single demand class, while [7] treats the network setting.Rather than provide exact results, they investigate the asymptotic behavior of such a system as the resource capacities and demand rates grow large.They show that using a fixed price, which can be determined by solving a particular nonlinear program, is asymptotically optimal in the sense that the ratio of the expected revenue produced by their fixed price scheme to the optimal expected revenue converges to unity.Seen another way, if we let n be the scaling parameter, their result shows that the expected revenue loss from using their policy is o(n).Their result is a consequence of the strong law of large numbers, and the type of asymptotic optimality that they prove has come to be called 'fluid scale asymptotic optimality'.Although the fixed price rule obtained in [6] and [7] is simple, it is perhaps too simple: It is 'open-loop' and does not respond to fluctuations of the demand from its expected value.In this paper we examine the asymptotic behavior of dynamic pricing in network revenue management on the more sensitive 'diffusion' scale, based on a (functional) central limit theorem.In particular, we define a simple feedback based dynamic pricing policy that we call the 'bridge policy', for reasons explained below, and show that it is asymptotically optimal on diffusion scale: the expected revenue loss from our policy is o( √ n).
When the processes involved in the dynamic pricing problem are viewed at diffusion scale, one may take formal limits, and by doing so one obtains a simple control problem involving diffusion processes, that can be solved in a relatively easy way.The benefit of solving the diffusion control problem is not that it automatically produces a solution to the prelimit problem in any sense, but that, as often occurs, understanding the former helps propose good control policies for the latter.In fact, a solution to the diffusion control problem turns out to be of the form of a Brownian bridge.Particularly, the scaling limit of the process representing demand is a diffusion that hits a certain target at time T with probability one.Our proposed policy for the dynamic pricing problem thus mimics the dynamics of the Brownian bridge.As in the case of the Brownian bridge, the policy steers the demand process so that at time T it hits a level close to the target.One of the main technical issues dealt with in this paper is obtaining estimates showing that the level hit by the demand process is indeed sufficiently close to the target.
In a recent paper, independent of ours, Jasin [8] considers a discrete time version of the dynamic pricing problem of [6] and [7] (with no terminal reward) and introduces a simple improvement to the static price control given in those papers.Translated into the notation of this paper, it is shown in [8] that the revenue loss is O(log(n)).When our result is specialized to the situation with no terminal reward, which we present in Section 3.1, our proposed policy is essentially the same as that of [8], modulo the difference between discrete and continuous time.
Other recent work related to the pricing problem of Gallego and Van Ryzin [6,7] has focused on the situation where the demand function is unknown and must be estimated.Besbes and Zeevi [1] consider a single product setting, while Besbes and Zeevi [2] consider a multiple product case.They provide combined demand estimation and pricing algorithms that achieve fluid scale asymptotic optimality.
There is another category of network revenue management problems, termed quantity-based revenue management in [17].(The problem considered in this paper is in the category termed price-based revenue management.)In that setting each customer class has an associated pre-determined price, and the control consists of accept/reject decisions at arrival epochs.A simple (open-loop) fluid scale asymptotically optimal policy for this problems, which entails solving a linear program, was identified by Cooper [3].A modification of that policy, which involves also solving a second linear program (at a judiciously chosen time) was shown to be asymptotically optimal on the diffusion scale by Reiman and Wang [14].Jasin amd Kumar [9] further showed that sufficient repeated re-solving can reduce the expected revenue loss (relative to an upper bound on expected revenue) to a constant that is independent of the problem size.
The notion of re-solving has also been considered in the context of pricebased revenue management.In particular, Maglaras and Meissner [11] investigate re-solving in a single resource multi-product context, while Gallego and Hu [5] introduce a stochastic (non-zero sum noncooperative) game as a model for competing suppliers selling over a finite horizon, and examine the use of re-solving in this game.Maglaras and Meissner [11] show that continuously re-solving the fixed price problem yields a control that is asymptotically optimal on fluid scale.Note that re-solving in this manner yields a feedback based policy.Gallego and Hu [5] introduce the notion of an 'asymptotic Nash equilibrium' (which is a fluid scale notion) and show that re-solving yields an aymptotic Nash equilibrium for their stochastic game.
Given the above work on re-solving it seems natural to ask what the relationship is between the bridge policy and re-solving.A key point is that the bridge policy does not involve re-solving.The bridge policy actually has a very simple form.Based on the solution of two finite dimensional optimization problems (one for fluid scale and the other for diffusion scale), target values are set for the total number of arrivals in each class.The prices are then adjusted to yield arrival rates such that the expected number of arrivals will enable the target to be hit at T .Note that, although this involves continually readjusting prices/arrival rates, it does not involve re-solving the two optimization problems.The re-solving heuristic of [11], on the other hand, requires repeatedly re-solving the fluid scale problem to obtain an updated 'fixed' price.This re-solving is likely to yield arrival rates that do not match those arising in the bridge policy (especially considering that the bridge policy utilizes the solution to a second optimization problem that is not part of the fluid scale analysis), so it seems that typically the policies will not match.
Readers desiring more background on either theoretical or practical aspects of revenue management should consult [17].
The rest of this paper is organized as follows.In Section 2 we provide a more detailed description of the model and describe our main results.In Section 3 we discuss two applications of our results to models introduced in [7].Section 4 contains proofs of the main results as well as some supporting lemmas.
We conclude this section by defining some notation.For a positive integer d and x ∈ R d we let x denote the Euclidean norm.If A is a d × d matrix, A denotes the corresponding operator norm of A ( A := max{ Ax : x ∈ R d with x ≤ 1}).We denote by D(R d ) the space of functions from R + to R d that are right continuous on R + and have finite left limits on (0, ∞) (RCLL), endowed with the usual Skorohod topology.For X ∈ D(R d ) and t > 0 we write X * t := sup 0≤s≤t X(s) and ∆X(t) = X(t) − X(t−).Finally, [•] is the floor function: [x] is the largest integer that is not larger than x.
2. Setting and main results.
2.1.The model.The network we consider consists of L resources, where L is a positive integer.The capacity of resource l is C l , 1 ≤ l ≤ L. There are J customer classes, where class j customers need an integer amount A lj ≥ 0 of resource l.Naturally, it is assumed that every class uses at least one resource and every resource is used by at least one class: (2.1) Let (Ω, F, P ) be a complete probability space, supporting all processes defined in this paper.Following [7], the demand of each class j is modeled as a point process with intensity λ j , and the vector-valued process of demand intensities λ = (λ 1 , . . ., λ J ) is regarded as a control process.To this end we let π j , j = 1, . . ., J be independent standard Poisson processes with right-continuous sample paths, and let (2.2) D j (t) := π j t 0 λ j (s)ds represent the number of class j customers arriving up to time t.Set D = (D 1 , . . ., D J ).Let also X = (X 1 , . . ., X L ), where denotes the capacity of resource l remaining at time t.We define admissible controls via the martingale formulation.This approach has proved to be very useful in control theoretic frameworks; see eg. [10].To this end, denote by {e j , j = 1, . . ., J} the standard basis in R J , and for v ∈ R J + and f : We say that an R J + -valued process λ on [0, T ] with sample paths in D(R J ), satisfying (2.4) ess sup λ * T < ∞, is an admissible control if, for every bounded function f : is an {F t }-martingale, where F t is the P -completion of σ{D(s) : s ≤ t}, and D is the corresponding process from (2.2).Note that by this definition we are allowing the controller to observe the demand process D. We denote by A the class of all admissible control processes.We also consider a setting in which the process X is required to satisfy a positivity constraint.We say that λ is an admissible control for the problem with positivity constraints, and write λ ∈ A + , if λ ∈ A and the corresponding process X satisfies X(T ) ≥ 0 a.s., where, throughout, T ∈ (0, ∞) is a fixed terminal time.
We are given a function p : R J + → R J + , where for λ ∈ R J + , e j • p(λ) represents the price (per usage) for using route j, when the demand vector is λ.We also let r : R J + → R + , r(λ) := λ • p(λ) represent the revenue rate associated with intensity of demand λ.The function r is assumed to be concave and twice continuously differentiable on R J + , with r(0) = 0. (These are similar to the assumptions on r made in [7].We require more smoothness than in [7] but do not impose the requirement that any unconstrained maximizer of r be bounded.Perhaps most importantly, [7] allows the demand-price relationship to vary over time, while we do not.Assumption 3 below imposes additional, more technical conditions on r, as does Assumption 6 for the second example that we consider.)A function g : R J + → R represents the terminal reward associated with D(T ).The total reward is given as where the dependence of D on λ is via (2.2).The above identity follows from the fact that [0,T ] p(λ(s)) • (dD(s) − λ(s)ds) has mean zero as a stochastic integral with respect to the process D − • 0 λ(s)ds, which is a martingale by the definition of admissible controls and the boundedness of λ assumed in (2.4).The assumptions on r and λ make the first expectation well defined and finite.We assume that the second expectation is also well defined.This must be checked for each application.A sufficient condition for this is that g is bounded from either above or below.The dynamic pricing problem consists of maximizing the reward over all admissible controls.We have two notions of value, corresponding to the two versions of the problem: where we use the same letter for both; the distinction between the two versions of the problem will be made by referring to them as the problem with or without positivity constraints.
Remark 1.An important element of our model is the function p that represents the price given demand.Perhaps a more natural viewpoint in price-based revenue management is to start with a function λ that represents the demand rate for each given price and obtain p as its inverse.To be more precise, assume a function λ : D → R J + is given, where D ⊂ R J + .A standard assumption, made for example in [7], is the existence of a so-called null price.This null price assumption takes one of two forms.In the first form, for each J ⊂ {1, . . ., J} there exists a price p * ,J ∈ R J + such that λ j (p * ,J ) = 0 for all j ∈ J .In the second form, for every J ⊂ {1, . . ., J} there exists a sequence of prices {p k,J , k ≥ 1}, with p k,J ∈ R J + for k ≥ 1, such that lim k→∞ λ j (p k,J ) = 0 for all j ∈ J .In either case this implies the existence of a price at which any subset of the J arrival processes can be turned off.In the first case this price is finite.In the second case it is infinite.(In both cases the null price assumption can be viewed as an assumption that the service provider can simply block the various customer classes.)If λ, which represents demand given price, has an inverse p that maps R J + into R J + , then the first form of the null price assumption holds.In this case p may serve in our model.However, situations where the second (but not the first) form of the null price assumption hold are also natural.Thus one would like to allow for a function λ having a well-defined inverse p as a function from (0, ∞) J to R J + .We would like to make the point that such a scenario can also be treated by our model, as follows.Consider the function p alluded to above, defined on (0, ∞) J , and define r(λ) on all of R J + by Note that in (0, ∞) J this gives λ • p(λ), and that on the boundary this definition is consistent with the convention 0 × ∞ = 0. Of course, r must satisfy our assumptions.(Particularly, its continuity on the boundary corresponds to an assumption on p, namely that λ Fluid scaling and a fluid optimization problem.We next describe how an appropriately scaled version of the model leads to what we refer to as a Fluid Optimization Problem (FOP).We consider a scaling of the model, indexed by a parameter n ∈ N. A superscript n will be used to denote the dependence on n in the notation of all stochastic processes, as well as the filtration {F n t } and the classes A n , A n + .The capacity of resource l is scaled as C n l = [n Cl ] with 0 < Cl < ∞ fixed constants.The scaled version r n of r is defined by the relation (This arises under the common scaling λ n (p) = nλ(p).In particular, the equality λ n (p) = nλ(p) implies that p n (nλ) = p(λ), so that r n (nλ) = nr(λ).) In the problem associated with n we let (2.5) where λ n is an admissible control, and, with an appropriate terminal reward g n , let (2.7) The following assumption guarantees that the scaling of the functions g n is consistent with that of the other quantities in our model.
Assumption 1.One has for the problem with, and, respectively, without positivity constraints), where ḡ is a continuous function.
Next, write Xn (t) = n −1 X n (t) and λn (t) = n −1 λ n (t).The fluid model is obtained as limits are taken formally, assuming that Xn and λn converge, and denoting by X and, respectively, λ their limits.The fluid model analogue of (2.5) takes the form (2.9) We thus obtain the following fluid scale functional (deterministic) optimization problem: In the problem with positivity constraints, one adds to the above the constraint X(T ) ≥ 0. Observe that the argument of ḡ in (2.10) depends on λ only through T 0 λj (s)ds, 1 ≤ j ≤ J.Moreover, due to the concavity of r, given that T 0 λj (s)ds = y j , 1 ≤ j ≤ J, the first term is maximized by choosing λj (s) = y j /T for 0 ≤ s ≤ T and 1 ≤ j ≤ J.As a result, the unconstrained and, respectively, constrained version of the above functional optimization problem can be transformed into the following finite dimensional fluid optimization problem (FOP): where We restrict our attention to cases where the maximum in (2.11) is attained at a finite value of y.For the problem with positivity constraints the maximum is over D + , which is compact by (2.1), so the maximum in (2.11) is attained in this case.For the problem without positivity constraints attainment of the maximum in (2.11) is not guaranteed, so we impose the following assumption.
Assumption 2. For the problem without positivity constraints, the max- Let ȳ denote a point where the maximum of f is attained (we use the same notation for both versions of the problem).

Diffusion scaling and a diffusion control problem.
We begin by defining some 'second order' quantities.First, let Xn be defined by centering (using the solution of (2.10)) and normalizing X n : and define second order versions of the reward and value as or with supremum over A n + , in the case of the problem with positivity constraints.For y ∈ R J + , let Let also
In the case without positivity constraints we define D = Û , and in the case with positivity constraints we define Second order corrections for the running and terminal rewards are defined as We make the following assumption regarding r.Assumption 3.There exist finite positive constants b r , c r and δ r such that the Hessian D 2 r of r satisfies where the first inequality holds for all λ satisfying |λ − λ| < δ r , and the second inequality holds for all λ.
Note that it follows from the above assumption that λ → r(λ)+ br 2 λ− λ 2 is concave, and, as a result, the function The inequality (2.20) is used in the proof of Theorem 1(i) (in Section 4.2) to show that policies that are not well behaved in a certain sense do not perform well.The left hand inequality of Assumption 3 is used in the proof of Theorem 1(ii) (in Section 4.1) to obtain a uniform bound on ̺ n .Assumption 4. For some continuous function ĝ, (2.21) ĝn → ĝ as n → ∞, uniformly on compact subsets of D.

Denote (2.22)
With an abuse of notation, we sometimes write Note that an immediate consequence of the definition of admissible controls via the martingale problem is that By Thus by (2.19), (2.29), (2.26), and (2.28), We restrict our attention to cases where h attains a maximum over D. This is imposed in the first part of the next assumption.The second part of the assumption imposes a uniform growth rate on h n that is used to prove uniform integrability of h n ( Dn (T )).
Assumption 5. i.The function h attains a global maximum at some point d * ∈ D.
ii.There exists a constant c, independent of x and n, such that Because ̺ n satisfies (2.20), the last term of (2.30) serves as a penalty for using large values of u n , and this will later allow us to argue that large values of Dn (T ) are also penalized (in Section 4.2).Toward obtaining a diffusion control problem, we take limits formally.By differentiability of r, the function ̺ n converges pointwise to zero as n → ∞.We will thus drop the last term in (2.30) in the diffusion control problem formulation.
Note that Ŵ n converges to a standard Brownian motion.To obtain a formal limit for W n we substituting the quantity λj (from the FOP) for λn j (•) in the definition (2.25) of this process.On some complete filtered probability space with filtration (F t ), let W be a J-dimensional (F t )-standard Brownian motion and denote W = ( W 1 , . . ., W J ), where W j = λ1/2 j W j .Note that for j ∈ J 0 (ȳ), W j vanishes.The diffusion control problem is to maximize where over all processes u that are (F t )-progressively measurable and such that D(T ) ∈ D a.s.This problem has a simple solution, as one can find u for which D(T ) = d * a.s.Indeed for j ∈ J + (ȳ), let Dj be the Brownian bridge from 0 to d * j , given as the unique strong solution to the SDE and let Then (2.33) holds and D has a continuous extension to [0, T ] satisfying D(T ) = d * a.s.[13, pp. 243-245].Observe that equations (2.34)-(2.35)can be used to describe the solution ( Dj , u j ) even for j ∈ J 0 (ȳ) (in which case Wj vanishes), because in this case the solution to these equations is precisely (2.36).This point of view will be useful in the next subsection.One checks that equation (2.34) is solved as 2.4.Asymptotically optimal controls.Analogy to the diffusion control problem suggests to define u n in such a way that the following set of equations is satisfied: for a sequence d * ,n → d * .However, care must be taken to assure λ n (s) ≥ 0, namely u n (s) ≥ −n 1/2 λ, and in the problem with positivity constraints, that X n (T ) ≥ 0. To achieve these goals, we define (λ n , X n ) in two steps.We first define a triplet (Λ n , ∆ n , Ξ n ) in place of (λ n , D n , X n ), for which (2.38) holds, but the constraints alluded to above are not necessarily met.Consider the set of equations where a n j = n 1/2 d * ,n j + nT λj , that, given π j , j = 1, . . ., J, clearly has a unique solution Λ n = (Λ n 1 , . . ., Λ n J ) on [0, T ).Let the constants d * ,n j be chosen in such a way that and at the same time a n j are nonnegative integer numbers.Note carefully that the fact that a n j are nonnegative integer numbers assures that the solution Λ n j to (2.39) never becomes negative.That is, we always have Λ n j (t) ≥ 0. Let ∆ n be defined via (2.6) with (∆ n , Λ n ) in place of (D n , λ n ), let Ξ n be defined via (2.5) with (Ξ n , ∆ n ) in place of (X n , D n ), and U n be defined via (2.14), with (U n , Λ n ) in place of (u n , λ n ).Fix a constant α such that (2.41) 0 < α < δ r .

Let
(2.42) and, with Ā := max j,l A jl , let for the problem without, and, respectively, with constraints.At the second step, for t  2.28), one checks by direct calculation that (2.38) holds on this interval.The constraint λ n (t) ≥ 0 is met because, as argued above, Λ n are nonnegative.Also, in the problem with positivity constraints, by (2.43), X n (t) ≥ 0 for t < τ n ; and by (2.5) and (2.44), X n does not vary on the time interval [τ n , T ] and so X n (T ) ≥ 0 holds a.s.Furthermore, the boundedness assumption (2.4) holds by construction.Hence, to show that the constructed processes λ n are admissible controls, it only remains to prove the martingale property.This result is standard, and for completeness we have included it at the end of Section 4.4.
Since the construction is based on an imitation of the Brownian bridge dynamics, we will refer to the admissible controls λ n and corresponding processes (D n , X n ) just constructed, as the bridge policy.
Theorem 1. i. Suppose that Assumptions 1-5 hold.Assume moreover that there exists a constant c 1 , independent of x and n, such that ii. Suppose that Assumptions 1-5 hold.Then, under the bridge policy, The following result shows that, under certain regularity conditions, there is a simple open loop control policy that achieves asymptotic optimality, except that in the problem with positivity constraints, service is stopped when the boundary is reached.See Section 4.3 for the proof.
Corollary 1. Suppose that Assumptions 1-5 hold, that ȳj > 0 for 1 ≤ j ≤ J, and that for the problem with positivity constraints, (Aȳ) l < Cl for 1 ≤ l ≤ L. Assume moreover that ḡ is differentiable at ȳ. Finally, assume that the function ĝ is given by where b is a constant.Then h(d) = b for d ∈ R J .Moreover, for the problem without positivity constraints, the open loop control using u n (t) = 0 for all 0 ≤ t ≤ T, n ≥ 1, satisfies Furthermore, for the problem with positivity constraints, the control that sets u n (t) = 0 for t < σ n and λ n = 0 for t ≥ σ n , where σ n is defined by achieves the same asymptotic upper bound, (2.48).

Examples.
We present two examples in this section, both of which were treated, at the fluid level, in Gallego and van Ryzin [7].
Note that Furthermore, if L * < L then γ l = 0 for l / ∈ L * .Since g n (x) = ḡ(x) = 0, x ≥ 0, we also have ĝn (d) = 0, d ∈ D, where D is defined in (2.17 x .Finally, (2.45) holds because, as mentioned above, h n = h is bounded above by zero.So Theorem 1 can be applied here: The bridge policy with d * = 0 is asymptotically optimal.
A natural 'enhancement' of the above example would be to include a nonnegative salvage value for unsold capacity.This salvage value would constitute a terminal reward, and it is reasonable to take the salvage value to be concave and non-decreasing.Although it seems intuitively clear that the solution to the FOP with this terminal reward would be no larger than that without it, there may still be resources that are fully sold: (Aȳ) l = Cl for some l.If, however, (Aȳ) l < Cl , 1 ≤ l ≤ L, ȳj > 0, 1 ≤ j ≤ J, and the terminal reward is differentiable at ȳ, then things may simplify substantially.In particular, if we scale so that g n (nx) = ng(x), which seems natural in this context, then ḡ(x) = g(x) and we have ĝ(d) = ∇ḡ(ȳ) • d for d ∈ R J .Corollary 1 shows that a 'nearly open loop' policy in which u n (t) = 0, 0 ≤ t < σ n , and λ n = 0, σ n ≤ t ≤ T, n ≥ 1, where σ n is defined in (2.49), is asymptotically optimal on diffusion scale.(Strictly speaking this is not an open loop policy, because it uses information about σ n ; however, as the proof shows, the probability that σ n < T tends to zero, and so roughly speaking it is open loop.)The next section treats an example with a terminal reward arising in a different context.

Gallego and van
Ryzin's no-shows and overbooking model.We now consider a modification of the above model where some customers are noshows.In response to this possibility, the service provider may overbook by selling more of certain resources than is actually available.This can lead, in turn, to not being able to satisfy all customers that do show up, which leads to denied boarding charges.As we show below these denied boarding charges represent a penalty to the service provider that we formulate as a (negative) terminal reward.The analysis of this case is more involved than that of the basic model considered in Section 3.1.The rest of this section is organized as follows.In Section 3.2.1 we introduce the model for no-shows (which is the same as in [7]), and derive expressions for the revenue and terminal reward.We introduce and solve the fluid optimization problem (FOP) in Section 3.2.2, and show that Assumptions 1 and 2 hold.We also introduce a 'modified' FOP and obtain relations satisfied by the Kuhn-Tucker vector of this problem that help to simplify the diffusion scale analysis.In Section 3.2.3we begin the diffusion scale analysis by deriving ĝ and showing that Assumptions 4 and 5(ii) hold.The diffusion scale analysis is completed in Section 3.2.4where we show (under some additional assumptions) that Assumption 5(i) holds, along with (2.45), a hypothesis of Theorem 1.We show how the maximization of h can be translated into the maximization of a separable function.Maximizing this separable function gives rise to a variant of the classical 'newsvendor' problem, which has an explicit solution.
3.2.1.Model and cost structure.We use the cost structure introduced in Section 6 of [7].When a class j customer that paid p for a ticket is a no-show, the customer pays a penalty of β j p + c j , where 0 ≤ β j ≤ 1 and c j ≥ 0. Thus the refund to the customer is p(1−β j )−c j .(The parameters β j and c j should be such that p(1 − β j ) − c j ≥ 0. We address this issue below.)Each class j customer shows up with probability 1 − q j , where 0 ≤ q j ≤ 1 (and hence is a no-show with probability q j ), 1 ≤ j ≤ J.We assume that all of these no-show events are independent.We can express the expected revenue rate of class j customers straightforwardly.To do so we need to slightly modify our notation.Let R j (λ) = λ j p j (λ), 1 ≤ j ≤ J, and define r j (λ) := (1 − q j (1 − β j ))R j (λ) + q j c j λ j .We assume that R(λ) = R 1 (λ) + • • • + R J (λ) satisfies the conditions previously imposed on r(λ).Then our newly defined r(λ) satisfies these conditions as well.
The no-show penalties imposed on the customers have been absorbed into the revenue rate r(λ).Thus the denied boarding charges imposed on the service provider constitute the terminal reward.Recall that D j (T ) denotes the number of class j items sold.Let q = (q 1 , . . ., q J ), m = (m 1 , . . ., m J ), and Z(q, m) = (Z 1 (q 1 , m 1 ), . . ., Z J (q J , m J )), where Z 1 (q 1 , m 1 ), . . ., Z J (q J , m J )) are independent random variables, and Z j (q j , m j ) has a binomial distribution with parameters q j and m j , 1 ≤ j ≤ J. Then Z(q, m) represents the number of no-shows for all J classes with m arrivals.Thus D j (T ) − Z j (q j , D j (T )) is the net number of class j seats sold.Let denote the remaining amount of resource l.Note that we can (and often will) have η l < 0, indicating that there is not enough of resource l to satisfy all demand.
As in [7] we assume that the total denied boarding charges (paid by the service provider) consist of the cost of acquiring the additional resources needed to satisfy all demand, and that additional capacity on resource l can be obtained at a unit cost of ν l , 1 ≤ l ≤ L. (Thus the denied boarding charge for a route is simply the sum of the denied boarding charges for all resources in a route, with multiplicity if applicable.)The total of the denied boarding charges paid by the service provider is thus − L l=1 ν l (η l ∧ 0), and the terminal reward is where x + := max(x, 0).

3.2.2.
The fluid optimization problem.The terminal reward in the n th system is Since n −1 Z j (q j , ny j ) → q j y j a.s. as n → ∞ by the strong law of large numbers, the dominated convergence theorem yields so Assumption 1 holds.
The FOP in this case is to maximize The function f is continuous and concave, and the maximization is over R J + : there are no further constraints.On the other hand, f is not differentiable over all of R J + .Nonetheless, in order for ȳ ∈ R J + to be an optimal solution it is necessary and sufficient that ȳ is a local maximum for f .Let L := {1, . . ., L}, and Ãlj := A lj (1 − q j ), 1 ≤ j ≤ J, 1 ≤ l ≤ L. Fix a fluid optimal solution ȳ ∈ R J + , and define Let L 0 := #{l : l ∈ L 0 }.If L 0 = 0 then f is differentiable at ȳ and the standard stationary point condition for a local maximum holds: ∂f ∂y j (ȳ) = 0, j ∈ J + (ȳ) (3.9) lim ε→0+ ∂f ∂y j (ȳ + e j ε) ≤ 0, j ∈ J 0 (ȳ).(3.10)However, if L 0 > 0, which will be true in cases of practical interest, then f is not differentiable at ȳ. (Roughly speaking, the resources l ∈ L 0 are neither over or under provisioned.We take the sizing of the resources as given.A proper sizing should lead to most, if not all l ∈ L 0 .)Although f is not differentiable at ȳ, it is simple to write down the first order expansion of f at ȳ.For z ∈ D and ε small, (3.11) Thus, a necessary and sufficient condition for ȳ to be a local maximum of f is that, for any z ∈ D with |z| = 1, (3.12) By considering z = ±e j , 1 ≤ j ≤ J, (3.12) gives rise to the easier to check necessary conditions We now verify that f , as defined in (3.5), satisfies Assumption 2. We need to impose a condition on the penalty and denied boarding charges.In particular, we assume that (3.15) The left hand side of (3.15) is the expected value of the fixed part of the penalty paid by a class j customer, while the right hand side is the expected cost for the service provider to buy resources to accommodate a class j customer.Thus (3.15) indicates that the service provider cannot profit, due purely to the fixed penalty cost, by overbooking.We also impose the following assumption on R.
Assumption 6. i.For any ε > 0 there exists an M < ∞ such that, if The existence of a finite maximizer for f is an immediate consequence of the following lemma, whose proof is given in Section 4.4.
Lemma 1. Suppose that R satisfies Assumption 6(i), and that (3.15) holds.If y (k) ∈ R J + , k ≥ 1, and J j=1 y Finding a global maximum of h (and hence solving the diffusion control problem) is greatly simplified by using properties of the solution of the FOP.The key relationships arise through the solution of the following modified FOP: We now verify that the ȳ chosen above as a solution of the FOP is also optimal for the modified FOP.By the definition of L 0 in (3.8), ȳ satisfies (3.16) and is thus feasible.Let Note that, if y ∈ D 0 then f (y) = f M (y).Thus, since ȳ ∈ D 0 , ȳ maximizes f (y) over D 0 , so it is optimal for the modified FOP.
We now deal with an issue left open above and show that the refund for all customers is positive.Recall that the refund to a no-show customer of class j that paid p is p(1 − β j ) − c j .Let p = p( λ), where λ = ȳ/T and ȳ is a solution to the FOP (3.5).We assume that pj > c j 1 − β j , 1 ≤ j ≤ J.
By Assumption 6 (ii), p is locally Lipschitz at λ. Thus writing p j (t) for p j ( λn (t)), there exists a K < ∞ such that Recall from the construction of {U n (t), 0 ≤ t ≤ T } that (3.20) sup In addition, since λ n (t) = 0 for τ n ≤ t ≤ T , the price in this interval is irrelevant: with probability 1 no customers will arrive during [τ n , T ].Let and note that l ∈ L 0 , L ± according to whether κ l = 0, > 0, < 0. Let be mutually independent, and write Ẑn (d) for the (column) vector ( Ẑn j (d j ), j = 1, 2, . . ., J).
Lemma 2. Let U i , i = 1, 2, . . ., r be mutually independent r.v.s and let V i , i = 1, 2, . . ., r be mutually independent r.v.s.Assume that the first moment of each U i and V i are finite.In addition, assume that, for each i, We can couple Ẑn (d) and Ẑn (0) so that with the convention that, if d j < 0 then we interpret . With this notation, using (3.27), we have To estimate r n 1 , we use Lemma 2 with r = J + 1, where the first collection of r.v.s is ( Ã′ l d, −A lj Ẑn j (0), j = 1, 2, . . ., J), and the second collection is ( Ã′ l d, −A lj Ẑj , j = 1, 2, . . ., J).
To verify the assumptions of the lemma we invoke Theorem 5.16 of [12] on the rate of convergence in the CLT, by which (3.28) where N ∼ N (0, q(1 − q)) and c 1 is a constant not depending on n, x or q.Since Ẑj is equal in distribution to √ ȳj N , we obtain from (3.28) where c does not depend on n or x.We conclude by Lemma 2 that r n 1 ≤ cn −1/2 .Next, if d < n α then by the Cauchy-Schwarz inequality Since α < 1/2, we have r n 1 +r n 2 → 0. This proves (3.26).Claim (3.25) follows.We next show that where c is independent of x and n.This is Assumption 5(ii).To show this, note that it suffices to prove that |ĝ n (d)| ≤ c(1 + d ) for a constant c independent of d and n.For l ∈ L 0 , it is clear that , a > 0, shows that (3.30) is also valid for l ∈ L + .Now, using Jensen's inequality and a calculation of the second moment of Ẑn (d), we have E Ẑn (d) ≤ c( ȳ +n −1/2 d ) 1/2 .Claim (3.29) thus follows.

3.2.4.
The diffusion control problem.The function to be maximized for the diffusion control problem is and the domain to maximize over is D = Û .Let J + := J − J * .For convenience (and without loss of generality) we assume that L 0 = {1, . . ., L 0 } and J + (ȳ) = {1, . . ., J + }.Substituting (3.18) into (3.31)yields, for d ∈ Û , If L 0 = 0 then h(d) ≤ 0 for all d ∈ Û .Since h(0) = 0, d * = 0 is an optimal solution here.Thus Assumption 5(i) holds in this case.We verify (2.45) below, allowing us to conclude that in this case using the bridge policy with d = 0 is asymptotically optimal on diffusion scale.If, in addition to L 0 = 0 we also have J * = 0, then by (3.4) and (3.24) we have ĝ(d) = ∇ḡ(ȳ) • d for d ∈ R J .Thus, by Corollary 1 the open loop control using u n (t) = 0 for all 0 ≤ t ≤ T, n ≥ 1 is asymptotically optimal on diffusion scale.Henceforth we assume that L 0 > 0. We also assume that the columns of { Ãlj , 1 ≤ l ≤ L 0 , 1 ≤ j ≤ J + } span R L 0 .Under this assumption the determination of d * simplifies dramatically.A simple sufficient condition for this is that, for every resource l, there is an associated class j(l) with ȳj(l) > 0, such that A lj(l) > 0 and A kj(l) = 0, k = l.This corresponds, in an airline context, to each link having a route (with nonzero usage in the fluid limit) which uses only that link, and this should typically hold in practice.Let Û 0 = {u ∈ R J : u j = 0, j > J + }.Under this assumption, for any w ∈ R L 0 , there exists a d ∈ Û 0 such that Since ȳ solves the FOP, we have, for ε such that ȳ + εd (k) ∈ R J + (recall that ȳj > 0, 1 ≤ j ≤ J + , by assumption), that f (ȳ + εd (k) ) ≤ f (ȳ).By (3.11), Then, if w l = J j=1 Ãlj d j , 1 ≤ l ≤ L 0 , Ψ(w) = h(d).We can maximize Ψ over R L 0 , and, once we have found a maximizer w * ∈ R L 0 such that Ψ(w * ) = sup w∈R L 0 {Ψ(w)}, we can then choose d * ∈ Û 0 such that w * l = J j=1 Ãlj d * j , 1 ≤ l ≤ L 0 .This d * will then be a maximizer of h.Note that, for d ∈ Û 0 , h(d) = h(d).Since d * ∈ Û 0 , h(d * ) = h(d * ), and hence d * also maximizes h.
The function Ψ is separable.In particular, we can write We can thus maximize each ψ l separately.This is a slight variant of the classical 'newsvendor' problem from inventory theory.Under the condition γ l < ν l the optimizing w l is obtained by straightforward differentiation to find the unique stationary point of the concave function ψ l .This yields w * l = σl Φ −1 ( γ l ν l ), where Φ −1 (•) is the inverse of the standard normal distribution function.A straightforward calculation yields where φ is the density of the standard normal distribution function.If γ l = ν l then there is no maximizing w l : sup w l ∈R ψ l (w l ) = 0, and ψ l (w l ) → 0 as Finally, we show that (2.45) holds.We have, using (3.18) and (3.23), For l ∈ L 0 , the lth term in (3.33) is given by By the inequality γa − ν[a − b] + ≤ ν|b| (which holds for a, b ∈ R, provided 0 ≤ γ ≤ ν), this term is bounded by c(E Ẑn (d) + 1).For l ∈ L − , the lth term is Calculating the second moment of Ẑn (d) directly from (3.22) gives Combined with (3.34), . This shows (2.45), and concludes the verification of all assumptions of Theorem 1.

Proofs.
We first present the proof of part (ii) of Theorem 1, in Section 4.1.The proof of part (i) relies on part (ii), and is therefore presented afterwards, in Section 4.2.Section 4.3 contains the proof of Corollary 1. Finally, Section 4.4 contains the proofs of some of the Lemmas.4.1.Proof of Theorem 1(ii).We use c to represent a constant, independent of n and t, that can vary from appearance to appearance.We prove the result by showing, in steps 1-4 below, that the first term in (2.30) converges to h(d * ), and, in step 5, that the second term in (2.30) Also note that, as in the case of the diffusion control problem, in which the solution is given by (2.37), we can solve (2.38) as where In the first four steps below we prove (4.1) and the uniform integrability alluded to above.
Step 1: We show where c < ∞ does not depend on n.We remind the reader of the Burkholder-Davis-Gundy (BDG) inequality, which states that for any local martingale M and p ≥ 1, where the constant c p depends only on p, and [M, M ] is the quadratic variation process defined by [X, X] = X 2 − 2 X − dX (see [13] p. 58, and p. 175); if X has piecewise smooth sample paths, null at zero, then [X, X] t is given by s≤t ∆X(s) 2 (see for example [13], Theorem 22(ii), p. 59).
Since W n is a martingale, so is Q n .The martingale Q n is piecewise smooth and Q n (0) = 0 and so [Q n , Q n ](t) = s≤t ∆Q n (s) 2 .Let N n denote the counting process that counts jumps of W n (• ∧ τ n ).An application of the BDG inequality with p = 2 yields By construction, u n (s) ≤ αn 1/2 for s ≤ τ n .As a result, Thus EN n ((t 1 , t 2 ]) ≤ cn(t 2 − t 1 ) and we have In step 3 we will also use the following estimate, that is based on the BDG inequality with p = 4, and proved in a similar manner.Namely, with where on the last line we used the fact that, since λn is bounded above, we have for disjoint intervals I = (t 1 , t 2 ] and J = (t 3 , t 4 ], t 2 ≤ t 3 , Let S n i denote the event that i is the smallest nonnegative integer k for which sup (note that, for each n, there is a finite number of such integers k, because τ n ≤ T − n −1 ).Using (4.5) we therefore have where c does not depend on n. (4.4) follows.
Step 2a: Consider the case of the problem with positivity constraints.We will show that there exists a constant c such that (4.7) Let E n 1 denote the event that τ n is incurred by having Ξ n l 'hit' Ā for some l (cf.(2.43)).On E n 1 one necessarily has X n l (τ n −) ≤ 2 Ā for some l, hence min l min t<τn X n l (t) ≤ 2 Ā.If X n l (t) ≤ 2 Ā for some l, t, then by (2.13), the identity A λT = C − x, and nonnegativity of xl and (A λ) l , we have for a suitable constant c > 0, where in the second last inequality we assumed n is sufficiently large, and used A Dn (t) ≥ Xn (t) −1, that follows from (2.31) and (2.32) for large n.In the last inequality we used step 1. Next, recalling that for t < τ n , u n (t) has the form (2.38), and using (2.43), On the event in the indicator function above, one has for a suitable constant, where we used step 1. Combining (4.9) and (4.10) yields (4.7).
Step 2b: Consider now the case of a problem without positivity constraints.In this case τ n is defined via (2.42), and therefore the estimate involving E n 1 is not needed, and the estimate involving (E n 1 ) c holds true.Consequently, (4.7) is valid.
Step 3: We show (4.13) below.Note first that by (4.7), for a suitable constant c > 0, Step 4a: Consider first the problem with positivity constraints.In view of (4.13) and (4.2), to show (4.1) it suffices to show that (4.14) Clearly, the estimate (4.11) is not good enough.However, we can redo step 2a more carefully, using now the improved estimate (4.13) of step 3 in place of (4.4) from step 1.Let ε > 0 be given and consider the event where provided n is sufficiently large, where c > 0 is a constant not depending on n and ε.By (4.13), p 1 (n, ε) → 0 as n → ∞.Next, on the event where c > 0 does not depend on n and ε, and we used the notation (2.16) for L * .Combining this with (2.32), we conclude By (4.13) and the fact that ∆ Dn ≤ cn −1/2 , we have Dn (τ n −) ⇒ d * .Using (2.31), the expression in curly brackets in the above display converges weakly to +∞ for l > L * , and it converges weakly to (−Ad * ) l +cε for l ≤ L * .But since d * ∈ D, (−Ad * ) l ≥ 0 for l ≤ L * (see (2.17)).We conclude that p 2 (n, ε) → 0 as n → ∞.As a result, Since ε is arbitrary, (4. Step 4b: In the problem without positivity constraints, the estimate (4.15) is valid without the term p 2 , and therefore (4.14), and in turn, (4.1) and (4.16), follow as above.
We have shown (4.19).Since ε > 0 is arbitrary, this shows (4.17), and concludes the proof of part (ii) of the theorem.4.2.Proof of Theorem 1(i).Part (i) is proved in three steps.The first step uses the result of part (ii) of the theorem, along with (2.45), to show that policies under which Dn (T ) is large do not perform well.More precisely, it shows there exists a constant c such that Recall the process N n introduced at Step 1 of the proof of Theorem 1(ii).
Recalling that W n is a martingale and applying the BDG inequality the way it is used in that proof yields, for a universal constant c, Since n −1 λ n (s) = λ + n −1/2 u n (s), we have Write ∆ n for Dn (T ).We have by Step 1 that E ∆ n 2 ≤ cn 1/2 .Thus, using the assumption (2.45), where we used (3.26) and (4.28).This shows (4.27).Thus (4.22) holds.