Dispatching fixed-sized jobs with multiple deadlines to parallel heterogeneous servers

We study the M/D/1 queue when jobs have firm deadlines for waiting (or sojourn) time. If a deadline is not met, a job-specific deadline violation cost is incurred. We derive explicit value functions for this M/D/1 queue that enable the development of efficient cost-awaredispatchingpoliciestoparallelservers.Theperformanceoftheresultingdispatching policies is evaluated by means of simulations. © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
In the dispatching problem, each arriving job is routed to one of the available servers immediately upon arrival.Even though a single fast server would often be preferred, the parallel servers are needed to match increasing capacity demands.Moreover, short latency, in the absence of preemptive scheduling, requires parallel servers.
In this paper, we consider a cost structure based on (firm) deadlines.Each job has a certain deadline for the maximum waiting time it can tolerate.If this waiting time is exceeded, a deadline violation cost is incurred, but the job must still be served.This cost structure stems from quality-of-experience metrics, where customers observe a good service level whenever the waiting time is ''short'', but as soon as a given customer-specific threshold is exceeded, the observed service quality drops.That is, the tails of the response time distribution are one of the most crucial performance measures [1].Similarly, service level agreements (SLAs) are often defined in terms of acceptable waiting times [2].
This basic setting has been studied recently in [3] in the context of M/G/1 queues.However, the results given there are either asymptotic or in the form of differential equations.In contrast, here we derive exact closed-form expressions (that satisfy the aforementioned differential equations and asymptotic behavior).More specifically, the main contributions of this paper are the first exact results for the value function and admission cost for the M/D/1 queue subject to a general deadline-based cost structure.Even though the service times are assumed to be fixed, the deadlines and their violation costs can vary according to some probability distributions.Moreover, there can be multiple deadlines with added cost for each deadline that is violated.
The approach itself is general, and traditionally the objective is the minimization of the mean sojourn time (see, e.g., [4][5][6]), possibly combined with the energy consumption (see, e.g., [7,8]).The value function for M/G/1-FCFS then enjoys elementary closed-form expressions.However, e.g., the processor sharing (PS) scheduling makes the situation more complex and exact results are available only for M/D/1-PS and M/M/1-PS [4,9].The approach lends itself also to minimization of blocked jobs in loss systems [10].

Basic model and notation
The basic model for a single M/D/1-FCFS queue with deadlines is as follows.We let λ denote the arrival rate and d the constant service time of a job in the M/D/1 queue so that the offered load is ρ = λd.Jobs whose waiting time in queue, W , reach time τ , referred to as the deadline, incur a unit cost.This is equivalent to having a deadline τ + d for the sojourn time.We assume that ρ < 1 for stability, and the deadline must be positive to be meaningful, τ > 0. The mean cost rate is In general, the distribution of the waiting time cannot be expressed in simple terms, but instead in the form of the Laplace-Stieltjes Transform (LST) [11] or an infinite sum involving convolutions [12].However, for the M/D/1 queue the waiting time distribution is available [13][14][15] In the general case, we have multiple classes of jobs, each with its own arrival rate λ i , target deadline τ i and i.i.d.deadline violation cost H i .The total arrival rate is λ = ∑ i λ i , and the stability requirement is that λd = ρ < 1.The mean cost rate in this case is Our first task is to derive the so-called value function with respect to the deadline cost structure.Formally, the value function is defined as where u is the current backlog (unfinished work) in the queue, and the random variable V (u, t) denotes the deadline violation costs during time (0, t) when the system is initially in state u.Given ρ < 1, the M/D/1 queue is stable, the system is ergodic, and the above limit is well-defined.(In fact, the limit is finite and well-defined also when ρ ≥ 1 and the system is unstable.) The M/G/1 queue has been analyzed in [3] in the context of the basic (single-class) cost structure.In particular, it is shown that the value function is a linear function of u for u > τ , and for 0 ≤ u ≤ τ , v(u) satisfies an integro-differential equation that can be solved numerically.Moreover, explicit results are given for M/G/1 when (i) τ < X and the load ρ < 1, and when (ii) τ ≫ X and ρ → 1 (the heavy-traffic regime), where X denotes the (random) service time.These two results naturally also hold for the corresponding M/D/1 queues.
In contrast, we analyze the general case when ρ < 1 and τ is arbitrary, and obtain an explicit closed-form expression for the value function.Moreover, we give the value function for the general multi-class case, and experiment with various dynamic dispatching policies obtained through one policy improvement step.

M/D/1 with single deadline
In this section, we assume a single deadline τ that applies to all jobs and a unit deadline violation cost, h = 1.These results are later generalized to multiple job classes with distinct deadlines in Section 4.
From [3], we know that the value function for u > τ is a linear function of u, i.e., v(u In general, the value function satisfies the following differential equation, where X denotes the random i.i.d.service time, and 1(u ≥ τ ) is 1 if the condition is true and otherwise zero.Additionally, v ′ (0) = 0.These equations can be solved numerically as discussed in [3].However, exact closed-form results have not been available.For M/D/1, the differential equation (4) simplifies: In general, the mean cost rate r follows from the boundary condition v ′ (0) = 0.However, with M/D/1 we can use (2).

Value function: single job class
In this section, we derive the solution of (5) for u ≤ τ .As (1) and ( 2) give the mean cost rate r for M/D/1, the value function for the tail u ≥ τ is Besides, we know from [3] that (5) expresses v ′ (u) as a function of v(u + t) with t ≥ 0. Since v(u) is known for u ≥ τ , v(u) can be solved backwards starting from u = τ .First, define n ≜ ⌈τ /d⌉, i.e. n is the finite positive integer number corresponding to the number of intervals with length d that fit below the threshold τ .In other words, n is the number of jobs required to ''fill'' an empty system to the level where a deadline violation would occur if a new job arrives.We consider sub-intervals of the form See Fig. 1.According to (5), v(u) in I i for i = 1, . . ., n depends only on v(u) in I i−1 .For i = 0, . . ., n, let ṽi (u) denote the function equal to the value function in the ith interval, i.e., ṽi (u) = v(u) − v(τ ) for u ∈ I i , and satisfying, for u ≥ 0 (cf.analytic continuation) The ṽi (u) are characteristic functions for the value function of the M/D/1 queue w.r.t.deadline.They can be determined from (6)   together with the boundary conditions: Then, we could obtain r from ṽ′ n (0) = 0, if we did not already have it from (2).The summation in (2) suggests considering the differences ṽi+1 (u) − ṽi (u).Define y 0 (u) = 0 and, for i = 1, . . ., n, let y i (u) = ṽi (u) − ṽi−1 (u).In view of ( 6) and ( 7), we find The next result derives expressions for the y i functions.
Since the solution of the homogeneous differential equation y ′ i (u) + λy i (u) = 0 is e −λu , the solution of (9) takes the form c i (u)e −λu , where c i is such that It follows from ( 12) and ( 13) that, for 2 We proceed by induction.First, we assume that (11) holds ) . ( Then, for k = 1, . . ., n − 1, Thus, , for k = 1, . . ., n − 1 and (11) holds by induction on k. □ We find an explicit result for the value function.
Theorem 2. The value function for an M/D/1 queue with respect to deadline at time τ with a unit violation cost is Proof.Follows directly from Lemma 1. □ Note that, in accordance with (3), v(u) in ( 16) reduces to a linear function when u ≥ τ and m = 0.Moreover, the latter sum is approximately e z i (u) when i is large, and thus replacing m with any m * less than m yields an approximation for v(u).
Alternatively, one can also write where Γ (a, z) is the incomplete gamma function.Given the value function, we can write down the admission cost Note that the first summation in ( 18) is a constant (cf. the linear term).Recall that v(u 2 ) − v(u 1 ) corresponds to the expected difference in the number of deadline violations between a system that has an initial backlog of u 2 and a system that is initially in state u 1 .Similarly, the admission cost a(u) tells us the expected increase in the number of deadline violations if a job is admitted to the system currently in state u, including the cost for the job itself.
In the general case, for the M/G/1 queue with several reasonable cost structures, including deadline violations and latency, it holds that where r denotes the corresponding mean cost rate (e.g., λ E[W ≥ τ ] or λ E[T ]).The above yields a simple identity for the mean admission cost to an empty system, E[a(0)] = r λ .
We can verify that this holds also for the M/D/1 queue with the deadline cost structure, i.e., (18) at u = 0 reduces to a(0) = P{W ≥ τ }, ∀ τ ≥ 0. As v(u) for the M/D/1 queue, given in (16), is strictly increasing and convex, a(u) is an increasing function of u.Moreover, for u < τ , a(u) = v(u + d) − v(u), and hence Therefore, the following bounds hold for a(u),

Numerical example
Let d = 1 and τ = 2.5 so that n = 3.The corresponding value function and admission cost are illustrated in Fig. 2 for λ ∈ {0.2, 0.6, 0.8}.The value function is smooth (except at u = τ ), whereas the admission cost behaves quite differently.For example, the unit cost due to the immediate cost of a deadline violation when u ≥ τ shows clearly.

M/D/1 with multiple job classes
In this section, we extend the system model and consider the multi-class scenario, where all jobs have the same fixed service time d, but their deadlines and deadline violation costs can vary.More specifically, we assume k job classes such that class i jobs have deadline τ i (from the arrival time) and each violation for class i jobs costs H i .The corresponding (Poisson) arrival rates are λ 1 , . . ., λ k , and are such that λd = ρ < 1, where λ = ∑ i λ i (i.e., a stable system).For convenience, we further define p i = λ i /λ.Let v i (u) denote the value function of a system with arrival rate λ and deadline τ i .As v i (u) − v i (0) corresponds to how many jobs more on average exceed the deadline τ i if the initial backlog is u instead of zero, then on average p i (v i (u) − v i (0)) of them belong to class i (superposition of Poisson arrival processes), and, as class i violations cost H i , we have where each v i (u) − v i (0) is given by ( 16) with (λ, τ i ).Note that this is valid because all job classes have the same service time d and are treated the same way under FCFS, and we can also assume that costs are paid upon arrival.
Similarly, the admission cost to the system can be written out, where the immediate cost is included only for the class of the arriving job.That is, if ( 18) is used, the admission cost of a class j job with violation cost h to an M/D/1 queue in state u is Note that h can be replaced with E[H j ] if the violation cost of the given job is unknown.
We note that without any technical difficulties, we can extend the model so that each job class can also have several deadlines with arbitrary violation costs.That is, a penalty is paid for each deadline that is violated for the same job.Then we can approximate any cost structure based on the waiting and/or sojourn times.For example, a cost structure for a single class with unit violation costs h i = h and deadlines τ i = ih, where i = 0, 1, . . ., converges to the cost structure where each job incurs a cost equal to its waiting time when h → 0. This is equivalent to the (mean) waiting time.However, for clarity of presentation we omit such examples.

Parallel servers
In this section, we consider a dispatching system with parallel servers, as illustrated in Fig. 3. First we develop efficient dispatching policies based on the new results given in the previous sections, and study them analytically.Then we evaluate our heuristics numerically through simulations.

Model and reference policies
We consider the following model for a multi-server system: • m parallel servers with service times d i , i = 1, . . ., m.
The offered load to the system is ρ tot = ∑ i λ i / ∑ i 1/d i , which is assumed to be less than one.We consider the following heuristic dispatching policies: • Random split (RND) routes a job to Server i with probability p i .The splitting probabilities p i are (typically) chosen so that the offered load ρ tot is balanced among the m servers, ρ i = ρ j .• Class-specific split (CIQ, class-is-queue) routes class i jobs to server i.Hence, we assume that m = s.
• Least-work-left (LWL) routes a job to the server with the smallest backlog.Let α(u 1 , . . ., u m , j) denote the chosen server for class j in state u 1 , . . ., u m .Then Ties can be resolved in an arbitrary fashion.
• Dead-k, introduced in [3], is like LWL, but server i = 1, . . ., k is excluded if u i > τ .Hence, backlog in the first k servers is rarely above the deadline threshold τ .(Assuming a common deadline, τ i = τ , ∀i.) Note that RND and CIQ are static policies, i.e., their actions are independent of the system's state.We develop new policies based on the value functions derived in the previous section by carrying out one policy improvement step.More specifically, the standard procedure (see, e.g., [3,5,10,16]) is as follows: 1. Choose a static policy α 0 , e.g., RND or CIQ.
2. With α 0 , the system decomposes and each server behaves as an independent M/D/1 queue.
3. Compute the value functions for each M/D/1 queue.4. Carry out the policy improvement step: We refer to these policies with FPI (first-policy-iteration).

Example #1: Slow server scenario
Let us start with a simple example of a single class and two heterogeneous servers, where Server 1 is faster than Server 2 (cf.slow-server problem).The corresponding processing times are d 1 = 1 and d 2 = 4 time units, and the target deadline is τ = 4.The static split is defined by probability p of routing a job to Server 1, so the server-specific arrival rates are λp and λ(1−p), respectively.Fig. 4 shows the split probabilities for three static policies: the load balancing split (p ρ = 4/5), the optimal split w.r.t.mean latency (p W ), and the optimal split w.r.t.deadline violations (p D ).For s servers, the optimal split w.r.t.mean latency p W is obtained by solving (cf.PK mean waiting time formula), where p = (p 1 , . . ., p s ) with p i ∈ P i = [0, pi ], d i denotes the service time of server i, and pi = min(1, 1/λd i ) > 0 (i = 1, . . ., s).In our example, s = 2 and d = (1,4).Similarly, the optimal split p D w.r.t.deadline violations is obtained from where G(λ, d i ) = P{W > τ } and is given in (2).Problems (20) and (21) are separable convex programs which can be solved efficiently in parallel.Methods of solution based on dual decomposition are supplied in Appendix.
Fig. 5 depicts the relative performance of the static policies.The y-axis is the deadline violation probability with the given policy divided by the corresponding probability with the optimal split p D .We have also included the performance with a single fast server with d = 4/5 for comparison.
With dynamic policies the routing decision depends on the state of the system, i.e., the backlogs (u 1 , u 2 ).LWL sends a job to Server 1 if u 1 ≤ u 2 and otherwise to Server 2. The Dead-1 policy is like LWL, but never sends a job to Server 1 if its backlog is more than τ .Finally, we can also carry out one policy improvement round, e.g., from the load-balancing random split RND; we refer to this policy as FPI.The Dead-1 and FPI policies are illustrated in Fig. 6 for λ = 1 so that ρ = 0.8.The x-axis corresponds to the backlog in Server 1, and y-axis to the backlog in Server 2. Consequently, at the top-left corner, all policies route a job to Server 1, and at the bottom right corner to Server 2. The black dots correspond to routing to Server 1 under FPI, whereas the switch-over curve of Dead-1 is drawn explicitly.For clarity, we have omitted LWL, whose switch-over curve is simply y = x.For reference, we have also included the switch-over curve of the optimal dispatching policy when the objective is to minimize the mean latency [17].We can see that both FPI and Dead-1 ''protect'' the faster Server 1 from overload.This, however, may cause Server 2 to become unstable when ρ is high, as noted in [3].
Numerical results with dynamic policies LWL, FPI and Dead-1 are depicted in Fig. 7.We have also included the loadbalancing RND for reference.It can be seen that initially LWL does a good job, but at higher loads it fails badly.FPI and Dead-1 have the best performance, as expected.

Example #2: High priority jobs
Suppose next that we have two job classes and two identical servers.The parameters of the job classes are given in Table 1.
Here Class 1 corresponds to high priority customers that have both more stringent deadlines and larger deadline violation penalties.

Class-specific heuristics
Here we consider two heuristic static policies: (i) random 50:50 split (RND), and (ii) assigning Class i jobs to Server i (CIQ).Interestingly, both have the same performance, yet they offer two quite different starting points for the policy improvement  step.In particular, a deviation from CIQ means that one is determining the states in which one should assign a Class 1 job to Server 2, and vice versa.Moreover, as the servers are identical, we have one more trick at our disposal: at any moment, we can renumber (interchange) the servers and effectively jump to another state. 1 Given we know the value functions, we can choose the renumbering such that the expected costs in future, assuming the given static policy, are minimized.This yields a new policy, FPI-CIQ-S, defined by where π iterates over all permutations of (1, . . ., n).The renumbering trick can be applied to any subset of identical servers.
Moreover, the resulting dispatching policy is symmetric.It is intuitively clear that the optimal dispatching policy must possess the same symmetry.

Resulting policies
The resulting policies are illustrated in Fig. 8 for λ = 1, i.e., ρ = 0.5.The upper row corresponds to the decisions for high priority jobs (class 1), and the lower row corresponds to low priority jobs (class 2).The light red solid lines indicate the threshold τ i for class i. Below the horizontal line, the backlog in Server 2 is less than τ i , and to left of the vertical line the backlog in Server 1 is less than τ i .
In the first column, we have the least-work-left (LWL) policy, which chooses Server 1 when u 1 < u 2 (bottom right triangle), and Server 2 when u 1 > u 2 .Ties, when u 1 = u 2 , can be resolved in arbitrary fashion in this case.
The second column corresponds to policy iteration when the starting point is (uniform) RND, yielding FPI-RND.In this case, the value functions for both servers are identical, and one obtains LWL (given the ties, indicated with light gray dots, are resolved in the favor of the shorter queue).
The third column has FPI-CIQ, which seems to be quite a sensible policy that ''protects'' Server 1 in order to minimize the deadline violations of high priority jobs.For example, whenever Server 2 can satisfy a deadline, a job of any class is routed there.Moreover, the low priority jobs can enter Server 1 only if their own server is ''full'' and Server 1 has no or very little work in the queue.
In the fourth column, we have a scheme combining policy iteration and renumbering, FPI-CIQ+S.This policy is, by construction, symmetric.In our case, the high priority jobs are assigned to the longer queue when both or neither queue can satisfy the target deadline, otherwise the queue that can satisfy the deadline (the shorter queue).Low priority jobs, on the other hand, are typically routed to the longer queue, except when one queue is too long and the other very short or empty.All these properties are quite reasonable.Mean cost per job Fig. 9. Simulation results with dynamic policies in scenario #2.

Simulation results
Next we will evaluate the performance of the different dispatching policies.Fig. 9 depicts the simulation results (in log scale on vertical axis) in this rather complex scenario.On the x-axis, we vary the offered load ρ from ρ = 0.27 to ρ = 0.99.
The y-axis corresponds to the average cost per job, i.e., we divide the accumulated costs by the number of jobs.Both RND and CIQ have the same rather poor performance, as expected.First policy iteration on CIQ reduces the deadline violation costs substantially (with or without switch).The best performance for this scenario is achieved after first policy iteration of the RND policy, which behaves like the LWL policy, as predicted by Fig. 8.

Summary
Past work has given explicit forms for the value function with respect to the deadline cost structure only for specific cases: (i) in the heavy-traffic regime as ρ ↑ 1, and (ii) when all service times are larger than the (single) deadline.In the heavy-traffic regime, the value function for M/G/1 (with large deadline) is quadratic.When the deadline is smaller than the service time, the value function includes an exponential term.
In this paper, we give the first exact expression for the value function with respect to (possibly multiple) deadlines for a single server queue under arbitrary load.To this end, we assume a Poisson arrival process and a fixed service time, i.e., the M/D/1 queue.The basic result takes the form of a double sum with a finite number of terms.This result is then generalized for the M/D/1 queue with multiple job classes, each having its own target deadline and violation cost.The availability of the value function enables policy iteration when developing cost-aware dispatching strategies for parallel servers, making these results immediately useful.Optimal split with respect to deadline violations.An analogous algorithm -omitted for concision -can be designed for Problem (21) by following a similar procedure.Although the complex structure of G makes it difficult to derive closedform expressions for the primal minima (x ⋆ 1 , . . ., x ⋆ s ), the latter may be computed numerically using Newton's method.The derivative of the dual function follows from Danskin's theorem [24], i.e. g ′ i (y) = x ⋆ i (y), whereas the second derivative of g can be inferred using the implicit function theorem.
A specificity of ( 21) is that the problem becomes ill-conditioned as soon as one server offers very short service times relative to the deadline.In that case, the dispatching policy p D tends to dispatch practically all the jobs to such a server.

Fig. 2 .Corollary 3 .
Fig. 2. Value function and the corresponding admission costs for an M/D/1 queue.

Fig. 8 .
Fig. 8. Routing a job to the ''high-priority'' Server 1 with FPI based on CIQ and RND.Black dots correspond to Server 1, white dots Server 2, and gray dots mean a tie.