On the Risk-sensitive Cost for a Markovian Multiclass Queue with Priority

A multi-class M/M/1 system, with service rate µin for class-i customers, is considered with the risk-sensitive cost criterion n −1 log E exp i ciX n i (T), where ci > 0, T > 0 are constants, and X n i (t) denotes the class-i queue-length at time t, assuming the system starts empty. An asymptotic upper bound (as n → ∞) on the performance under a fixed priority policy is attained, implying that the policy is asymptotically optimal when ci are sufficiently large. The analysis is based on the study of an underlying differential game.


Introduction
A Markovian queueing model consisting of a single server capable of serving jobs of k classes is considered.Job arrival rates are proportional to a (large) parameter n, and so are the processing rates for each of the class-i jobs, that, specifically, are given by µ i n.Let X n i (t) denote the number of jobs in the ith class at time t, assuming the system starts empty at time 0, and consider the scaled version Xn = n −1 X n .Under specific service policies, for example, serve-the-longest-queue and certain priority policies, it is well known that { Xn , n ∈ N} satisfy a sample-path large deviation principle [7].In this note we are interested in the dynamic control problem where a service policy is sought to minimize a cost at the large deviation scale.In particular, we consider the cost where T > 0 and c ∈ (0, ∞) k are fixed, and we denote G(ξ) = c • ξ(T ) for ξ : [0, T ] → R k .
The motivation for considering such a cost, referred to in the literature as risk-sensitive, for a queueing model, is that it strongly emphasizes large values of terminal queue length, and is thus natural when one seeks to prevent buffer overflow.Avoiding large waiting times so as to assure quality of service is a closely related motivation (though not directly addressed in this paper).
In an earlier paper [1] we considered a broader setting, of a model with multiple, heterogenous servers, of which the above is a special case, and a risk-sensitive cost defined similarly to (1.1), with a more general functional G of the whole path { Xn (t), 0 ≤ t ≤ T }.The limit of the optimal cost, as n → ∞, was characterized as the value of a certain two-player zero sum differential game (DG).In this paper a particular prioritytype strategy for the DG is studied.It is shown that at for sufficiently large c i this strategy is optimal for the DG, and that an analogous policy for the queueing control problem is asymptotically optimal.We further show that in a more general setup, the worst performance of that priority type strategy has a specific upper bound which is also obeyed by the asymptotic performance of the induced policy.
The strategy alluded to above is one that prioritizes the classes in the order of the index (1 − e −ci )µ i , with highest priority given to the class with highest index.This is reminiscent of the cµ rule, where priority is given according to the index c i µ i , known to be optimal under linear queue-length cost with weights c i : note in particular that if we scale all c i 's by the same small parameter ε, then the exponential priority rule agrees with the linear one for all sufficiently small ε.This result is useful in practical implementation because the priority based resource allocation policy is simple as well as robust (note, in particular, that it is independent of the arrival rates).The proof builds on results in our earlier paper [1] and on the general large deviation upper bound of Dupuis, Ellis and Weiss [5].In particular, the main argument consists of comparing the priority policy's performance, estimated using the results of [5], with the DG value using the connection established in [1].The paper is organized as follows.In Section 2 we present the queueing model and the main result.Section 3 describes the connection between the control problem and the DG, obtained in [1].An estimate of performance of DG is also obtained.In Section 4 that estimate is used to analyze the priority policy.Section 5 gives a lower bound on the DG's value, by which optimality of the priority rule for large c i follows.The appendix establishes the existence of a strategy for the DG that acts according to the priority discipline.

Model and main result
The model is parameterized by n ∈ N. It consists of k customer classes and one server.Arrivals into the system occur according to independent Poisson processes, with respective parameters nλ i , where λ i > 0 are fixed.Arriving jobs are queued in buffers, one dedicated to each class.The server is available to serve the customers at the head of the k lines, and is capable of splitting its effort among them.The service times are exponential, where a class-i customer is served at rate nµ i if the server dedicates all its effort to it.An allocation vector, representing the fraction of effort dedicated to each of the classes, is any member of , where K = {1, 2, . . ., k}.Denote e i as the n-tuple with 1 at ith place and 0's elsewhere.For n ∈ N denote S n = n −1 Z k + .Given n ∈ N and u ∈ U consider the operator (a generalization of Q- for f : S n → R. A control system consists of a triplet U n = (U n , Xn , (F t ) t∈[0,T ] ), defined on a given complete probability space (Ω, F, P ), where U n and Xn are pro- cesses taking values in U and S n , having RCLL sample paths, F t ⊂ F forms a filtration to which these processes are adapted, with probability one, Xn (0) = 0 and U n i (t) = 0 whenever Xn i (t) = 0, and finally, for every bounded f : S n → R, the pro- We refer to U n and Xn as the control and controlled process, respectively.For n ∈ N, the cost functional associated with a control system U is given by where g(x) = c • x, c ∈ (0, ∞) k and T > 0. The value of the control problem is given by V n = inf C n,U , where the infimum ranges over all control systems.It is known from [1] that the limit V lim = lim n→∞ V n exists (see Theorem 3.1 below for more details).
We also consider a special class of control systems.Given n, a stationary feedback control is any mapping U : S n → U such that U i (x) = 0 whenever x i = 0, i ∈ K. (2. 3) The corresponding controlled process is the Markov process Xn,U on S n , starting from zero, with infinitesimal generator L n U given by (2.4) In the queueing model, n Xn,U (t) represents the vector of queue lengths at time t when allocation is performed according to the feedback control U.With an abuse of notation, U is both a generic symbol for a control system and for a stationary feedback control.
This will cause no confusion.
We will be interested in the stationary feedback control that prioritizes classes according to the index μi = µ i (1 − e −ci ).Denote λi = λ i (e ci − 1) and W = min u∈U i ( λi − u i μi ) + .Assume throughout that the class labels are ordered so that For n ∈ N, this control, denoted by U * = U * ,n , is given by where the product is defined as 1 when i = 1.Our main result is as follows.
Theorem 2.1.The cost under the feedback controls of priority type, given by (2.6), obeys the following bounds i. (2.7) ii.If e ci ≥ µi λi for all i then V lim = W T .Consequently, U * is asymptotically optimal in the sense that lim n→∞ C n,U * = V lim .

Differential game setup
The limit on the l.h.s. of (2.7) can be characterized as the value of a DG, formulated as follows.Let M = R k + ×R k + and write generic members of M as m = (( λi ) i∈K , (μ i ) i∈K ).While λ and µ denote the actual arrival and service parameters for the system, a possibly different member m = ( λ, μ) of M will be interpreted as a perturbed set of parameters.Due to the exponential nature of the cost functional it is natural to expect this additional control which reposes on the Laplace's principle [3] ECP 19 (2014), paper 11. where with the convention 0 log 0 = 0. Let Ū = {ū : [0, T ] → U | ū is measurable} be the set of admissible dynamic allocations.Define the set of admissible dynamic perturbations dt, and with the corresponding Borel σfields.A mapping α : M → Ū is called a strategy if it is measurable and if for every m, m ∈ M and t ∈ [0, T ], The set of all strategies is denoted by A.
Let Γ 1 , the one-dimensional Skorohod map from C([0, T ] : R) to itself, be defined as and let Γ , mapping C([0, T ] : R k ) to itself, be given by Γ Thus ρ, heuristically, constitutes the cost of changing the measure and is incurred to Now we consider a strategy α * that prioritizes according to the indices μi , as in (2.5).More precisely, let α * be the strategy that sends m = (( λi (t)) i∈K , (3.5) These relations give rise to a unique, well-defined strategy as proved in the appendix.
We denote the performance of α * by Proposition 3.2.One has Since for every n, C n,U * ≥ V n , it follows from Theorem 3.1 that lim inf n→∞ C n,U * ≥ V .Thus in view of Proposition 3.2, to prove Theorem 2.1(i), it suffices to show, as we do in the next section, that ECP 19 (2014), paper 11.
Proof of Proposition 3.2.The fact that a strategy α * exists, as well as that under this strategy one has ψ i (s) ≥ 0 for all s, is proved in Proposition A.1 in the appendix.Proof of Lemma 3.3.The claim is proved by induction on k.The precise statement proved by induction involves an arbitrary set of parameters λ i , µ i , c i .Namely, given k and r, and any 3k-tuple of positive numbers λ i , µ i , c i , for which the parameters μi = µ i (1 − e −ci ) are ordered as in (2.5), the statement of the lemma is valid.Consider first k = 1.We will show First, the inequalities hold for every λi , μi , as can be verified in the following way.By direct calculation, the concave functions on the left hand sides have maxima at λi = λ i e ci and μi = µ i e −ci respectively.Thus, their maximum values can be computed and those are λ i (e ci − 1) and µ i (e −ci − 1) which are the same as λ and −μ respectively.By (3.8), this gives the ECP 19 (2014), paper 11.
Next, assuming that the claim holds for a given k, we show that it holds for k + 1.Let then r and m be given, and let u be as in (3.10)- (3.11).Denote C a,b = b i=a C i (u, m).Also, let W a,b (r) be defined as in (3.9),where the sums range from a to b.The induction assumption implies C 2,k+1 ≤ W 2,k+1 (r − u 1 ).
By definition of W , it is not hard to see that |W (r 1 ) − W (r 2 )| ≤ |r 1 − r 2 |μ max , where μmax is the largest parameter μi involved.Thus, recalling μ2 ≥ As a result, .
We have thus shown that C 1,k+1 ≤ W 1,k+1 (r) and completed the argument.

Priority-based feedback controls
In this section we prove (3.7), based on the general large deviation upper bound of [5].We begin by analyzing a wider class of stationary feedback controls (which, in this section we call controls, for short), and then specialize to U * .Recall that, given n, a control is defined as a map from S n to U .In this section we will consider sequences U n of controls that are all obtained from a single map U : R k + → U by way of restricting U to S n , for each n.Given n, there will be no confusion in referring to U itself as the control, and we shall do so.
Note that it is an empty set in the interior of R k + and K at the origin.I partitions R k + into sets that we will call facets.Let also Ī(x) = 2 I(x) be collection of all subsets of I(x).The class of controls U : R k + → U that we analyze consists of those that satisfy (2.3) and, in addition, take integer values and are constant on facets.That is, for every x, y ∈ R k + whenever I(x) = I(y).Under (4.1) U induces a map from 2 K to U , given by J ⊂ K → U(x) for some x such that J = I(x).For the ease of notation, we identify facets (i.e., subsets of R k + on which I is constant) with collections of indices (the corresponding value of I); moreover, we ECP 19 (2014), paper 11.
Page 6/13 ecp.ejpecp.orgrefer this map (U • I −1 ) by the same symbol U throughout this section.We follow this convention for other functions whose dependence on x is via U only.For a given U as in (4.1), we define the following quantities for each x ∈ R k + and p, q ∈ R k by Here h is the upper semi continuous regularization of H, whereas L and l are the Legendre-Fenchel transforms of H and h respectively.Exclusively for this section we consistently use ϕ to denote a generic element of A (this notation will be convenient when used in relation (4.10) below).Since, the maps H, h, L and l depend on x via U only, they are constants on each facets provided the other variable is fixed.Thus the naturally induced maps H(J, p), h(J, p), L(J, q) and l(J, q) are well defined.Proof.By Theorem 1.1 of [5], the sequence Xn of controlled processes associated with the controls U n , that are merely Markov processes with infinitesimal generators L n U n (2.4), satisfies a large deviation upper bound in D([0, T ] : R k ) with the good rate function I (see [5], [3] for this terminology; in particular, D is the space of RCLL functions with the Skorohod topology).The upper bound in Varadhan's lemma (Lemma 4.3.6 of [3]) can therefore be used.It is easy to verify the moment condition lim sup n −1 log Ee γng( Xn (T )) < ∞ (for some γ > 1) required for that lemma, by noting that ng( Xn (T )) is stochastically bounded by a r.v.α Poisson(βn) for some constants α, β.As a result, Proof of Theorem 2.1(i).By Proposition 3.2 and the discussion following it, the result follows from Propositions 4.1 and 4.2.
In the rest of this section we prove Proposition 4.
ξ J U i (J)μ i = q i ∀i}.
The collection Ξ consists of all possible normalized weights on the collection of all facets.If x belongs to a particular facet J, then Ξ(x) includes those members of Ξ which assign nonzero weights only to the facets whose closure include J. S (x,q) is the collection of pairs of rates and weights such that the speed due to resulting weighted service allocation match with q.
Proof.First, by the definition of L and H and using Lemma A.2 in the appendix, where the second equality follows by directly solving both optimization problems.We use the following representation of l, from Theorem 3.1 of [5]: where the infimum ranges over all maps J → (q J , ξ J ).Using the expression of L above, l(x, q) = inf ξ J q J = q, ξ ∈ Ξ(x) Hence, by restricting the minimizing set for variable (m J ), l(x, q) is bounded above by In order to prove the lemma, it remains to show that l(x, q) is also bounded below by the above quantity.Given x, (ξ J ) and (m J ), define, with 0/0 = 0, λ Since ω is convex, we have by changing the order of summation on the l.h.s.below and using Jensen's inequality, Thus the result follows from the above and (4.4).
Proof of Proposition 4.2.Consider the following system of equations, for (m, ξ) :  By a standard argument based on a measurable selection result such as [6], one can show Thus using Lemma 4.3, The inequality to be proved is A U * ≤ V * .Using the expression (4.2) for A U , and (4.7), it will follow if we show the inequality   It thus remains to prove (4.10).We do this by arguing that if ϕ ∈ A, (ξ, m) ∈ S * ϕ then u * [t] satisfies (3.4), (3.5) for a.e.t.Recall that U * (x) is defined, for x ∈ S n , in (2.6).For facets J, U * (J) is defined via the association of x with a facet to which it belongs.We can write this as Consider the case ϕ 1 = ϕ 1 (t) > 0. In this case, by the definition of I, 1 / ∈ I(ϕ).Consequently, 1 is not a member of any subset of I(ϕ), namely it is not a member of any J ∈ Ī(ϕ).Since (4.6) holds, ξ ∈ Ξ(ϕ) (where ξ = ξ(t), ϕ = ϕ(t), and this is valid for a.e.t).Thus by definition of Ξ, ξ charges only facets J ∈ I(ϕ).In particular, it charges only facets J with 1 / ∈ J.By (4.9) and (4.11), it follows that u * 1 = u * 1 (t) = 1.This shows that the first line of (3.4) is valid for a.e.t.
Fix an arbitrary m ∈ M and set u = α * [m].To prove the proposition it suffices to show that C(u, m) ≤ W T .Since ψ i (s) ≥ 0 for all s, we have ϕ(T ) = ψ(T ).Thus C(u, m) is given by C(u, m) = T 0 i C i (u(t), m(t))dt.By (3.4) and (3.5), for each t, u(t) satisfies the hypotheses of Lemma 3.3, with data m(t) and r = 1.Hence C(u, m) ≤ W T , which completes the proof.