Optimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traffic

We consider scheduling to minimize mean response time of the M/G/k queue with unknown job sizes. In the single-server case, the optimal policy is the Gittins policy, but it is not known whether Gittins or any other policy is optimal in the multiserver case. Exactly analyzing the M/G/k under any scheduling policy is intractable, and Gittins is a particularly complicated policy that is hard to analyze even in the single-server case. In this work we introduce monotonic Gittins (M-Gittins), a new variation of the Gittins policy, and show that it minimizes mean response time in the heavy-traffic M/G/k for a wide class of finite-variance job size distributions. We also show that the monotonic shortest expected remaining processing time (M-SERPT) policy, which is simpler than M-Gittins, is a 2-approximation for mean response time in the heavy traffic M/G/k under similar conditions. These results constitute the most general optimality results to date for the M/G/k with unknown job sizes. Our techniques build upon work by Grosof et al., who study simple policies, such as SRPT, in the M/G/k; Bansal et al., Kamphorst and Zwart, and Lin et al., who analyze mean response time scaling of simple policies in the heavy-traffic M/G/1; and Aalto et al. and Scully et al., who characterize and analyze the Gittins policy in the M/G/1.


INTRODUCTION
Scheduling to minimize mean response time 1 of the M/G/k queue is an important problem in queueing theory. The single-server k = 1 case has been well studied. If the scheduler has access to each job's exact size, the shortest remaining processing time (SRPT) policy is easily shown to be optimal [23]. If the scheduler does not know job sizes, which is very often the case in practical systems, then a more complex policy called the Gittins policy is known to be optimal [3,4,10]. The Gittins policy tailors its priority scheme to the job size distribution, and it takes a simple form in certain special cases. For example, for distributions with decreasing hazard rate (DHR), Gittins becomes the foreground-background (FB) policy, 2 so FB is optimal in the M/G/1 for DHR job size distributions [3,4,9].
In contrast to the M/G/1, the M/G/k with k ≥ 2 has resisted exact analysis, even for very simple scheduling policies. As such, much less is known about minimizing mean response time in the M/G/k, with the only nontrivial results holding under heavy traffic. 3 For known job sizes, recent work by Grosof et al. [11] shows that a multiserver analogue of SRPT is optimal in the heavy-traffic M/G/k. For unknown job sizes, Grosof et al. [11] address only the case of DHR job size distributions, showing that a multiserver analogue of FB is optimal in the heavy-traffic M/G/k. 4 But in general, optimal scheduling is an open problem for unknown job sizes, even in heavy traffic, We therefore ask: What scheduling policy minimizes mean response time in the heavy-traffic M/G/k with unknown job sizes and general job size distribution?
This is a very difficult question. In order to answer it, we draw upon several recent lines of work in scheduling theory.
• As part of their heavy-traffic optimality proofs, Grosof et al. [11] use a tagged job method to stochastically bound M/G/k response time under each of SRPT and FB relative to M/G/1 response time (Fig. 2.1) under the same policy. • Lin et al. [18] and Kamphorst and Zwart [14] characterize the heavy-traffic scaling of M/G/1 mean response time under SRPT and FB, respectively. • Scully et al. [27] show that a policy called monotonic shortest expected remaining processing time (M-SERPT), which is considerably simpler than Gittins, has M/G/1 mean response time within a constant factor of that of Gittins. While these prior results do not answer the question on their own, together they suggest a plan of attack for proving optimality in the heavy-traffic M/G/k.
When searching for a policy to minimize mean response time, a natural candidate is a multiserver analogue of Gittins. As a first step, one might hope to use the tagged job method of Grosof et al. [11] to stochastically bound M/G/k response time under Gittins relative to M/G/1 response time. Unfortunately, the tagged job method does not apply to multiserver Gittins, because it relies on both stochastic and worst-case properties of the scheduling policy, whereas Gittins has poor worst-case properties.
One of our key ideas is to introduce a new variant of Gittins, called monotonic Gittins (M-Gittins), that has better worst-case properties than Gittins while maintaining similar stochastic properties. This allows us to generalize the tagged job method [11] to M-Gittins, thus bounding its M/G/k response time relative to its M/G/1 response time.
Our M/G/k analysis of M-Gittins reduces the question of whether M-Gittins is optimal in the heavy-traffic M/G/k to analyzing the heavy-traffic scaling of M-Gittins's M/G/1 mean response time. However, there are no heavy-traffic scaling results for the M/G/1 under policies other than SRPT [18], FB [14], first-come, first served (FCFS) [16,17], and a small number of other simple policies [5,8]. To remedy this, we derive heavy-traffic scaling results for M-Gittins in the M/G/1. It turns out that analyzing M-Gittins directly is very difficult. Fortunately, M-Gittins has a simpler cousin, M-SERPT, which Scully et al. [27] introduce and analyze. We analyze M-SERPT in heavy traffic as a key stepping stone in our heavy-traffic analysis of M- Gittins. This paper makes the following contributions: • We introduce the M-Gittins policy and prove that it minimizes mean response time in the heavy-traffic M/G/k for a large class of finite-variance job size distributions (Theorem 3.1). • We also prove that the simple and practical M-SERPT policy is a 2-approximation for mean response time in the heavy-traffic M/G/k for a large class of finite-variance job size distributions (Theorem 3.2). • We characterize the heavy-traffic scaling of mean response time in the M/G/1 under Gittins, M-Gittins, and M-SERPT (Theorem 3.3). Section 3 formally states these results and compares them to prior work. Their proofs rely on a large collection of intermediate results, which we outline in detail in Section 4 and prove in Sections 5-7.

PRELIMINARIES
We consider an M/G/k queue with arrival rate λ and job size distribution X . Each of the k servers has speed 1/k, so regardless of the number of servers, the total service rate is 1 and the system load is system load is ρ = λE [X ]. This allows us to easily compare the M/G/k system to a single-server M/G/1 system, as illustrated in Fig. 2.1. We assume a preempt-resume model with no preemption overhead. This means that the single-server M/G/1 system can simulate any policy for the M/G/k system by time-sharing between k jobs. Throughout this paper we consider the ρ → 1 or heavy-traffic limit. This is the λ → 1/E[X ] limit with the job size distribution X and number of servers k held constant.
We write F for the cumulative distribution function of X and F (x) = 1 − F (x) for its tail. We assume that X has a continuous, piecewise-monotonic 5 hazard rate .
We also frequently work with the expected remaining size of a job at age a, which is E[X −a | X > a]. We assume it, too, is continuous and piecewise-monotonic as a function of a.
The above assumptions on hazard rate and expected remaining size are not restrictive and serve primarily to simplify presentation. It is very likely that our proofs can be generalized to relax them.

SOAP Policies and Rank Functions
All of the scheduling policies considered in this work are in the class of SOAP policies [26], generalized to a multiserver setting. In a single-server setting, a SOAP policy π is specified by a rank function r π : R + → R which maps a job's age, namely the amount of service it has received so far, to its rank, or priority level. Single-server SOAP policies work by always serving the job of minimal rank, breaking ties in FCFS fashion. 6 As an example, FB is a SOAP policy with r FB (a) = a. Because lower age corresponds to lower rank, under FB, the server prioritizes the job of least age. 7 We define multiserver SOAP policies in much the same way as the single-server case. The only difference is that the system can serve up to k jobs, so a multiserver SOAP policy works as follows: • If there are at most k jobs in the system, serve all of them.
• If there are more than k jobs in the system, serve the k jobs of minimal rank, breaking ties in FCFS fashion. 5 A function is piecewise-monotonic if, roughly speaking, it switches between increasing and decreasing finitely many times in any compact interval. 6 The full SOAP class allows a job's rank to depend on both its age and its "static" characteristics, such as its size or class, but we do not use this generality in this paper. 7

.2. Rank Function Examples
We often compare the k-server variant of a policy π to its single-server analogue. When it is necessary to distinguish between them, we write π -k for the k-server version of a policy, so π -1 is the single-server version. We write T π -k x for the size-conditional response time distribution of jobs of size x under π -k, and we write T π -k for the overall response time distribution.
There are four main policies we consider in this work: SERPT, M-SERPT, Gittins, and M-Gittins. None of these policies require knowledge of job sizes, but each uses the job size distribution to tune its rank function. As an example, Fig. 2.2 shows the four rank functions for a bounded distribution with nonmonotonic hazard rate.
Definition 2.1. The shortest expected remaining processing time (SERPT) policy is the SOAP policy with rank function .
As a reminder, lower rank means better priority, so, as hinted by its name, SERPT prioritizes the job of least expected remaining size.
Definition 2.2. The monotonic SERPT (M-SERPT) policy is the SOAP policy with monotonic rank function Definition 2.3. The Gittins policy is the SOAP policy with rank function . Definition 2.4. The monotonic Gittins (M-Gittins) policy is the SOAP policy with monotonic rank function The M-Gittins and M-SERPT policies, which both have monotonic rank functions, are the primary focus of this paper. Some of our intermediate results apply more broadly to any policy with a monotonic rank function.
Definition 2.5. A SOAP policy π is monotonic if its rank function is nondecreasing, meaning r π (a) ≤ r π (b) for all ages a < b. 8 Figure 2.2 shows the SERPT, M-SERPT, Gittins, and M-Gittins rank functions for a bounded distribution with nonmonotonic hazard rate. Notice that SERPT and Gittins are not monotonic. This makes it hard to analyze their M/G/k response time (Appendix A). In contrast, the M-SERPT and M-Gittins are monotonic: their rank functions alternate between constant regions and strictly increasing regions.
While the rank functions of Gittins and SERPT may not be monotonic, they are still well behaved under our assumptions on the job size distribution. L 2.6. Under the assumption that the job size distribution X has continuous and piecewisemonotonic hazard rate and expected remaining size functions, each of r SERPT , r M-SERPT , r Gittins , and r M-Gittins is continuous and piecewise-monotonic.

P
. It suffices to prove the claims for r SERPT and r Gittins . The claim for r SERPT is exactly our assumption on expected remaining size, and the claim for r Gittins is a known result [4, Theorem 1].

Job Size Distribution Classes
There are several classes of job size distributions we consider in this paper. We first briefly describe each class, then give the formal definitions.
-We focus especially on the OR(−∞, −2) subclass, all members of which have finite variance. • The MDA(Λ) class (Definition 2.12) contains, roughly speaking, distributions with smooth tails that are lighter than Pareto tails. It includes, among others, exponential, normal, lognormal, Weibull, and Gamma distributions. • The QDHR and QIMRL classes (Definitions 2.8 and 2.9) are relaxations of the well-known decreasing hazard rate (DHR) and increasing mean residual lifetime (IMRL) classes [1-4, 9, 21, 22, 28]. QDHR contains distributions whose hazard rate is roughly decreasing with age, even if it is not perfectly monotonic, and QIMRL contains distributions with roughly increasing expected remaining size.
-We focus especially on the subclasses MDA(Λ) ∩ QDHR and MDA(Λ) ∩ QIMRL. • The ENBUE class (Definition 2.10) is a relaxation of the well-known new better than used in expectation (NBUE) class [3,4,28]. 9 It contains distributions whose expected remaining size reaches a global maximum at some age.
-We focus especially on the Bounded subclass, which contains all bounded distributions. 8 The nonincreasing case is less interesting, because all nonincreasing rank functions encode FCFS. 9 Because the NBUE terminology originates in reliability analysis, the word "better" here means "longer".
We write OR(−β 0 , −α 0 ) for the set of O-regularly varying functions where the exponents α and β above may be chosen such that α 0 < α ≤ β < β 0 . 10 Abusing terminology slightly, we say a job size distribution X is O-regularly varying if its tail F is O-regularly varying, and OR(−β 0 , −α 0 ) also denotes the set of job size distributions with tail in OR(−β 0 , −α 0 ). Definition 2.8. A job size distribution is in the quasi-decreasing hazard rate class, denoted QDHR, if there exist a strictly increasing function m : R + → R + , an exponent γ ≥ 1, and constants C 0 , x 0 > 0 such that for all x > x 0 , Definition 2.9. A job size distribution is in the quasi-increasing mean residual lifetime class, denoted QIMRL, if there exist a strictly increasing function m : R + → R + , an exponent γ ≥ 1, and constants C 0 , x 0 > 0 such that for all x > x 0 , Definition 2.10. A job size distribution is in the eventually new better than used in expectation class, denoted ENBUE, if there exists an age a * ≥ 0 at which a job's expected remaining size reaches a global maximum, meaning that for all x a * , Definition 2.11. A job size distribution is in the bounded class, denoted Bounded, if there exists

MAIN RESULTS
We now present our main results, explaining how they relate to prior work in Section 3.1 We begin with our heavy-traffic M/G/k optimality result.
In such cases, M-Gittins-k is optimal for mean response time in heavy traffic.
The M-Gittins policy is based on the Gittins policy, which is somewhat complex to describe and compute. Fortunately, the M-SERPT policy, which can be much simpler to compute [27], also performs well in the heavy-traffic M/G/k.
In such cases, M-SERPT-k is a 2-approximation for mean response time in heavy traffic.
Theorems 3.1 and 3.2 apply to a broad class of finite-variance job size distributions. Roughly speaking, OR(−∞, −2) covers heavy-tailed distributions, and MDA(Λ) covers non-heavy-tailed distributions that are unbounded (Section 2.2). Assuming membership in these sets is standard for heavy-traffic analysis [14]. The main restriction the results impose is on MDA(Λ) distributions, for which we additionally require membership in QDHR or QIMRL. While slightly relaxing this restriction is possible, 11 removing it entirely appears to be very difficult (Section 8).
A key step in the proofs of Theorems 3.1 and 3.2 is analyzing M-Gittins and M-SERPT in the heavy-traffic M/G/1. This analysis is itself a new result of independent interest. Notably, it extends to ordinary Gittins in addition to M-Gittins, thus characterizing the optimal heavy-traffic scaling attainable by any scheduling policy.
where F −1 e is the inverse of the tail of the excess of X, namely

Relationship to Prior Work
Theorem 3.1 is the first result proving optimality of a scheduling policy in the heavy-traffic M/G/k with unknown job sizes and general job size distribution. As mentioned in Section 1, the only prior results of this type were shown by Grosof et al. [11], who prove similar results in two cases: SRPT and related policies in the case of known job sizes, and FB in the case of unknown job sizes with a decreasing hazard rate (DHR) job size distribution. To get a better sense of the class of distributions Theorem 3.1 applies to, we compare it to the classes of distributions featured in the prior results for SRPT and FB: • SRPT was shown to be optimal in the heavy-traffic M/G/k for job size distributions whose tail has upper Matuszewska index less than −2 [11, Theorem 6.1], which corresponds to satisfying the upper bound in Definition 2.7 for some α > 2. This is somewhat broader than the precondition of Theorem 3.1, though it is still limited to finite-variance distributions.
-Given that SRPT is designed for known job sizes while M-Gittins is designed for unknown job sizes, Theorem 3.1 complements the prior SRPT results.
• FB was shown to be optimal in the heavy-traffic M/G/k for job size distributions in the class DHR∩(OR(−∞, −2)∪MDA(Λ)) [11,Theorem 7.13]. 12 The DHR class is much more restrictive than QDHR, so this is much narrower than the precondition of Theorem 3.1.
-Given that FB is equivalent to M-Gittins in the DHR case [3,4], Theorem 3.1 subsumes the prior FB results.
There is another result that follows from two prior works that complements Theorem 3.1, although to the best of our knowledge it has never been explicitly stated. Köllerström [16,17] shows that under FCFS, the mean response times in the M/G/1 and M/G/k converge. This means that if Gittins and M-Gittins happen to be equivalent to FCFS for a given job size distribution, then FCFS minimizes mean response time in the heavy-traffic M/G/k. Aalto et al. [3,4] show this occurs exactly for job size distributions in the new better than used in expectation (NBUE) class, which includes some distributions that Theorem 3.1 does not cover.
Theorem 3.2 shows that a simple scheduling policy, namely M-SERPT, has mean response time within a constant factor of optimal in the heavy-traffic M/G/k with unknown job sizes and general job size distribution. Specifically, we show M-SERPT is a 2-approximation. This complements the result of Scully et al. [27], who show that in the M/G/1, M-SERPT is a 5-approximation for M/G/1 mean response time at all loads. Our result is more general in that it applies to multiserver systems, and it is tighter in that it shows a smaller approximation ratio, but it is less general in that it applies only to heavy traffic. The techniques we introduce could be useful for tightening the upper bound on M-SERPT's M/G/1 approximation ratio, which is conjectured to be 2 [27]. There are three other policies whose heavy-traffic scaling has been characterized: SRPT, FB, and a policy called randomized multilevel feedback (RMLF) [6,13]. We compare Theorem 3.3 to each of these prior results.
Lin et al. [18] study SRPT in heavy traffic. They show that for unbounded job size distributions with upper Matuszewska index less than −2, . This is similar to the second expression in Theorem 3.3, except it is missing an application of r M-SERPT . We will show r M-SERPT (a) = Θ(a) if X ∈ OR(−∞, −2) (Theorem 6.2), implying Gittins has the same heavy-traffic scaling as SRPT in that case. A similar observation holds for the OR(−2, −1) [20] or X ∈ ENBUE, implying Gittins has worse heavy-traffic scaling than SRPT in those cases. Kamphorst and Zwart [14] study FB in heavy traffic. They show that if X ∈ OR(−2, −1), then FB has the same heavy-traffic scaling as SRPT, which matches our result for Gittins in Theorem 3.3.
. This is similar to the second expression in Theorem 3.3, except it replaces r M-SERPT with r SERPT , which pinpoints exactly where FB's heavy-traffic scaling is suboptimal, seeing as r M-SERPT is a version of r SERPT that never decreases.
Bansal et al. [5] study RMLF in heavy traffic. They show that if E[X α ] < ∞ for some α > 2, then Because Gittins minimizes M/G/1 mean response time, this serves as an upper bound on the heavytraffic scaling of Gittins. However, as previously discussed when comparing Theorem 3.3 to prior results on SRPT, there are cases where Gittins matches the heavy-traffic scaling of SRPT, so our result is a tighter bound. With that said, requiring E[X α ] < ∞ for some α > 2 is more lenient than the precondition of Theorem 3.3, so there are still instances where (3.1) is the best known bound on Gittins's heavy-traffic scaling.

TECHNICAL OVERVIEW
In this section we walk through our proofs of the main results. Our main goal is to find a policy π with optimal mean response time in the ρ → 1 limit. In particular, it suffices to show that The only existing technique for proving a bound like (4.1) is the M/G/k tagged job method of Grosof et al. [11]. In general tagged job methods work as follows [11,12,15,19,[24][25][26]29]: one focuses on a "tagged" job throughout its time in the system, tracking how much each other job delays . The amount of time for which another job can delay is called the relevant work due to that other job. The specific M/G/k tagged job method [11] works by relating the amount of relevant work in an M/G/k under π -k to the amount of relevant work in an M/G/1 under π -1.
As a first approach, we might try to prove (4.1) for the case where π is Gittins using the M/G/k tagged job method. However, this method has only been applied to simple policies such as SRPT and FB, so it is not clear whether it can be applied to more complex policies, such as Gittins.
Unfortunately, the M/G/k tagged job method turns out not to work for Gittins. Under a SOAP policy with a nonmonotonic rank function, which often is the case for Gittins, a given job might in the worst case contribute more relevant work in an M/G/k than in an M/G/1 (Appendix A). This breaks the correspondence that Grosof et al. [11] rely on.
Our key insight is that we can generalize the M/G/k tagged job method to more complex SOAP policies as long as those policies have monotonic rank functions. This motivates the use of the M-Gittins policy, which is the monotonic cousin of Gittins.
In Section 5 we bound M/G/k response time under any monotonic SOAP policy π in terms of M/G/1 quantities. Specifically, we show in Theorem 5.1 that where the quantities on the right hand side, defined formally in Section 4.1, can be thought of as follows: • Q π -1 and R π -1 are distributions called waiting time and residence time, respectively [26]. Response time in the M/G/1 is the sum of waiting time and residence time. • S π -1 is a new distribution we call inflated residence time, which is similar to residence time but longer. Proving (4.2) is the first major stepping stone to proving Theorem 3.1, because it reduces an M/G/k analysis to an M/G/1 analysis. We see that only the E[R π -1 ] and E[S π -1 ] coefficients depend on k. To prove Theorem 3.1, we must show that the E[Q M-Gittins-1 ] term dominates in the ρ → 1 limit.
In the remainder of this section, our goal is to bound E[Q π -1 ], E[R π -1 ], and E[S π -1 ], where π is either M-Gittins or M-SERPT. We begin in Section 4.1 by explaining in more detail the concepts of relevant work and of waiting, residence, and inflated residence time. In doing so, we introduce Recall that the tagged job method works by focusing on the journey of a "tagged" job through the system. Roughly speaking, the relevant work due to any other job is the amount of time by which that job delays 's departure. A key insight from the M/G/1 SOAP analysis [26] is that to figure out how much another job delays , we need to look not at 's current rank but at its worst future rank. This is because even if has priority over another job at first, if 's rank increases as it ages during service, the other job might get priority.
Suppose that has size x. Under a monotonic SOAP policy π , such as M-Gittins or M-SERPT, the worst future rank will have is always the rank it will have just before completion, namely r π (x). The amount of relevant work due to another job ′ is the amount of time ′ is served while is in the system until ′ either completes or reaches rank r π (x). Due to the FCFS tiebreaking rule (Section 2.1), exactly what "reaches" means depends on when ′ arrives.
• New jobs, those that arrive after , contribute relevant work until they first have rank greater than or equal to r π (x). This occurs at a specific age called the new job age cutoff, denoted π x . • Old jobs, those that arrive before , contribute relevant work until they first have rank strictly greater than r π (x). This occurs at a specific age called the old job age cutoff, denoted z π x . Figure 4.1 illustrates the new job and old job age cutoffs π x and z π x , which are formally defined below. 13 Roughly speaking, • if r π is increasing at x, then π x = x = z π x ; and • if r π is constant at x, then π x and z π x are the endpoints of the constant region containing x. As Fig. 4.1 illustrates, we always have Definition 4.1. Let π be a monotonic SOAP policy. The new job age cutoff and old job age cutoff of size x are, respectively, When the policy in question is unambiguous, we drop the superscript π .
One can use new job and old job age cutoffs to write M/G/1 mean response time under a monotonic SOAP policy [27]. As a first step, we write M/G/1 response time T π -1 as a sum of two parts, called waiting time Q π -1 and residence time R π -1 [26]: We define waiting and residence times formally in Section 5. For now, we just need to know that their means can be written in terms of π x and z π x . Specifically, Scully et al. [27,Propositions 4.7 and 4.8] show where ρ and τ are defined as The significance of (4.2) is that it expresses M/G/k response time in terms of waiting and residence times, which are M/G/1 quantities. It also features a third quantity called inflated residence time S π -1 . We define inflated residence time formally in Section 5. For now, we just need to know that its mean, can be written in terms of π x and z π x . Note that

Bounding New and Old Age Cutoffs
Recall that proving our main results rests on characterizing the heavy-traffic scaling of where π is either M-Gittins or M-SERPT. As we see in (4.4) and (4.6), both π x and z π x feature prominently in the formulas of This means the first step of characterizing the heavy-traffic scaling of E[Q π ], E[R π ], and E[S π ] is understanding π x and z π x . We prove bounds on π x and z π x for a wide class of job size distributions. Table 4.1 summarizes these results, which are proven in Section 6.

Characterizing Heavy Traffic Scaling
Armed with bounds on age cutoffs, we are ready to characterize heavy-traffic scaling of mean waiting, residence, and inflated residence times. This is the subject of Section 7, in which These bounds on π x and z π x are critical for characterizing heavy-traffic scaling of for some δ > 0 Theorems 7.9 and 7.11 O(− log(1 − ρ)) Theorems 7.9 and 7.12 ) for all ε > 0 Theorems 7.10 and 7.11 for all ε > 0 Theorems 7.10 and 7.12   14 , with the latter sometimes requiring an additional condition. Specifically,

From Intermediate Results to Main Results
We now prove our main results. The proofs of Theorems 3.1 and 3.2 both follow the same three main steps: 14 That is, for all the classes in Table 4.2 except OR(−2, −1).
• The results in Table 4.
1. An M/G/1 can simulate any M/G/k policy by sharing the server, so the fact that Gittins minimizes M/G/1 mean response time means Theorems 7.5 and 7.9-7.12 imply that the second term vanishes in the ρ → 1 limit. A result of Scully et al. [27,Proposition 4.7] implies implying the desired result.
Theorems 7.5, 7.9 and 7.10 imply that the second term vanishes in the ρ → 1 limit. Scully which combines with (4.7) to imply the desired result.
To prove Theorem 3.3, we simply combine the results in Table 4.2. P T 3.3. We examine each case in turn.

M/G/k RESPONSE TIME BOUND
This section bounds M/G/k mean response time under any monotonic SOAP policy π . The notation used in Theorem 5.1 below is summarized in and therefore π -k k-server version of SOAP policy π Section 2.1 ρ(a), τ (a) functions of moments of min{X , a} (4.5) π x , z π x new job and old job age cutoffs Definition 4.1 T π -k response time under π -k Section 2.1 Q π -1 waiting time under π -1 (4.4) R π -1 residence time under π -1 (4.4) S π -1 inflated residence time under π -1 (4.6) Additionally, T π -k x is size-conditional response time for size x , and similarly for Q π -1 x , R π -1 x , and S π -1 x .

P
. In order to bound M/G/k mean response time, we use a tagged job method in the style of Grosof et al. [11], but we generalize it to allow an arbitrary monotonic SOAP policy π . We consider an arbitrary "tagged" job of size x arriving to a steady-state system. Our goal is to analyze the distribution of 's response time.
The first step is a shift in perspective: instead of thinking about time passing, we reason in terms of work completed. Since each of the k servers works at rate 1/k, the system has the capacity to perform work at rate 1. While is in the system, servers sometimes perform work and are sometimes left idle. This means 's response time is the sum of • the amount of work completed while is in the system and • the amount of work "wasted", meaning service capacity left idle, while is in the system. We bound 's response time by bounding the total amount of work above. We do so by dividing it into several pieces: • Tagged work: the work of itself.
• Virtual work: work on jobs prioritized behind , plus wasted work due to servers left idle.
• Relevant work: work on jobs prioritized ahead of . We divide this into two subcategories: -Old relevant work: relevant work on old jobs, namely those present when arrives.
-New relevant work: relevant work on new jobs, namely those that arrive after .
For the first two categories, we have the same simple bound as Grosof et al. [11]: tagged work and virtual work add up to at most kx. This is because tagged work is 's size x, and the scheduling policy ensures that a server only performs virtual work while is in service at another server. However, bounding the two relevant work categories is more complicated than in Grosof et al. [11].
We begin by asking: what rank must a job have to contribute to relevant work? Note that the job will never have rank greater than its rank upon completion, r π (x), since π is a monotonic policy. As a result, new relevant work can only be performed on jobs with rank strictly less than r π (x), and old relevant work can only be performed on jobs with rank less than or equal to r π (x). We can put this in terms of the age cutoffs defined in Definition 4.1: • new relevant work can only be performed on jobs up to age π x , and • old relevant work can only be performed on jobs up to age z π x . In the rest of this proof, x and z x refer to π x and z π x , respectively. To help us bound the amount of old relevant work performed while is in the system, we define a new concept: the amount of relevant work in the M/G/k system under π . Definition 5.2. Let RelWork π -k x (t) denote the amount of work in the M/G/k at time t which is relevant to a job of size x: where x ′ is the size of job ′ and a ′ (t) is its age at time t. We write RelWork π -k x for the steady state distribution of the amount of relevant work in the M/G/k system.
Since is a Poisson arrival, RelWork π -k x is the distribution of the amount of relevant work in the system when arrives. That amount is an upper bound on the amount of old relevant work that will be performed while is in the system.
To bound new relevant work, note that if a job ′ of size x ′ arrives while is in the system, then ′ contributes at most min{x ′ , x } new relevant work. As a result, new relevant work can be upper bounded by considering a transformed M/G/1 system in which the job size distribution is The amount of new relevant work that arrives to our real system is upper bounded by the total amount of work that arrives to the transformed system. Let B x (w) be the length of a busy period in the transformed M/G/1 system started by an initial amount of work w. If we set w equal to the amount of tagged, virtual and old relevant work, then B x (w) − w upper bounds the amount of new relevant work.
Combining our bounds yields x ) Lemma 5.3, proven later in this section, bounds RelWork π -k x in terms of RelWork π -1 x and z x , implying

2) Taking expectations gives us
Because π -1 is work conserving with respect to relevant work, the Pollaczek-Khinchine formula tells us which completes the proof of (5.1).
To connect (5.1) to the quantities E[Q π ], E[R π ], and E[S π ], we rewrite (5.2) as where all of the relevant busy periods are independent. Prior work on SOAP policies [26,27] gives names to some of the distributions on the right-hand side. 16 • The size-conditional waiting time for size x is a random variable Q π -1 x with distribution Q π -1 x ), and waiting time is Q π -1 = st Q π -1 X . • The size-conditional residence time for size x is a random variable R π -1 x with distribution R π -1 x = st B x (x), and residence time is R π -1 = st R π -1 X .
• As there is no concise name for B x (z x ) in prior work, we define size-conditional inflated residence time for size x to be a random variable S π -1 x with distribution S π -1 x = st B x (z x ), and we define inflated residence time to be S π -1 = st S π -1 X . With these definitions in place, (5.3) gives us so the result follows by taking the expectation of T π -k = st T π -k X . Theorem 5.1 applies only to monotonic SOAP policies. It is tempting to try to apply the same technique to SOAP policies with nonmonotonic rank functions, but as we discuss in Appendix A, the argument does not readily generalize.
The proof of Theorem 5.1 assumes a bound on RelWork π -k x . We prove the bound in the following lemma, which generalizes a similar lemma of Grosof et al. [11,Lemma 7.10].
x for all times t, and therefore . Throughout this proof, z x refers to z π x . We consider a pair of coupled systems with the same arrival sequence: • System 1, an M/G/1 using π -1; and • System k, an M/G/k using π -k. Our approach is to bound the difference in relevant work between Systems 1 and k at any time t.
Call a job relevant if it has age less than z x . These are the only jobs that contribute relevant work. To bound ∆ x (t), we divide times t into two types of intervals: • few-jobs intervals, during which there are fewer than k relevant jobs in System k; and • many-jobs intervals, during which there are at least k relevant jobs in System k. Note that both types of intervals are defined based on System k alone, so System 1 may or may not have relevant jobs during either type of interval.
Any time t is in either a few-jobs interval or a many-jobs interval. If t is in a few-jobs interval, the argument is simple: there are at most k − 1 relevant jobs in System k at time t, so ∆ x (t) ≤ RelWork π -k x (t) ≤ (k − 1)z x Suppose instead that t is in a many-jobs interval. Let s ≤ t be the start of the many-jobs interval containing t. We will show We begin by showing ∆ x (t) ≤ ∆ x (s). Note that arrivals do not affect ∆ x , because the two systems experience the same arrivals and have the same definition of relevant work. Next, note that service to irrelevant jobs does not affect ∆ x , because irrelevant jobs never become relevant under π , since π is a monotonic policy. In fact, the only way that ∆ x changes over a many-jobs period is due to service to relevant jobs. System k serves relevant jobs on all k servers throughout a many-jobs period, completing relevant work at rate 1. System 1 may or may not serve relevant jobs during a many-jobs period, so it completes relevant work at rate at most 1. This means ∆ x (t) ≤ ∆ x (s), as desired.
All that remains is to show that ∆ x (s) ≤ (k − 1)z x . Recall that s is the start of a many-jobs interval. Many-jobs intervals cannot start due to irrelevant jobs becoming relevant, because π is a monotonic policy. This means each many-jobs interval starts due to a relevant job arriving while System k has k − 1 relevant jobs. Relevant jobs arriving do not change ∆ x , as discussed above. This means ∆ x (s) = ∆ x (s − ), where s − is the instant before the arrival that starts the many-jobs interval. But s − is in a few-jobs interval, so

RANK FUNCTION BOUNDS
We now have a bound on M/G/k mean response time under monotonic SOAP policies π , including M-Gittins and M-SERPT. The bound (Theorem 5.1) is expressed in terms of E[Q π -1 ], E[R π -1 ], and E[S π -1 ], quantities which in turn are expressed in terms of the new job and old job age cutoffs π x and z π x . In order to prove optimality of M-Gittins in the heavy-traffic M/G/k, we need to understand the heavy-traffic behavior of E[Q π -1 ], E[R π -1 ], and E[S π -1 ], which, as we will see in Section 7, boils down to understanding the behavior of π x and z π x in the x → ∞ limit. This section is thus devoted to asymptotically bounding the new job and old job age cutoffs, and more generally the rank functions, of M-Gittins and M-SERPT.
Recall from Definition 2.2 that SERPT's rank function is used to define M-SERPT's. The following lemma shows that the two rank functions are equal at the new job and old job age cutoffs, and similarly for Gittins and M-Gittins.
and r SERPT (a) = Ω(a) follows similarly. This implies so the result follows from Lemma 6.1. The result follows by plugging in a = x and a = z x and applying Lemma 6.1.

Bounds on the M-Gi ins Rank Function
In this section we show two bounds on M-Gittins The proofs of these bounds are more involved than their M-SERPT counterparts from Section 6.1. The most important component is the following definition, which helps us better understand the M-Gittins rank function and relate it to the simpler M-SERPT rank function.
Definition 6.6. The time per completion over the age interval (a, b] is 17 . 17 Our time per completion function is the reciprocal of what Aalto et al. [3,4] call the efficiency function.
We extend this definition to the b → a and b → ∞ limits: We can write the rank functions of SERPT, M-SERPT, Gittins, and M-Gittins in terms of η as η(b, c).
Armed with Definition 6.6 and (6.1), we are ready to prove Theorems 6.4 and 6.5. The former proof relies on some technical lemmas that we defer to Section 6.3.
We will set C 0 ≥ 2, which covers the z x ≤ 2 x case. The rest of the proof is thus devoted to the z x > 2 x case. Our approach is to show there exist C 1 , C 2 such that for all x > x 0 , We begin with the upper bound on r Gittins ( x ). By Lemma 6.1, we have r Gittins ( x ) = r M-Gittins ( x ) for all sizes x, and by (6.1), we have r M-Gittins (a) ≤ r M-SERPT (a) for all ages a. Combining these observations with Theorem 6.2 implies r Gittins ( x ) = O( x ) and thereby implies the desired upper bound from (6.2). 18 We now turn to the lower bound on r Gittins ( x ). This requires Lemmas 6.7 and 6.8, which are facts about η that we prove in Section 6.3. Combining Lemma 6.7 with (6.1) and the fact that we are in the z x > 2 x case gives us By Lemma 6.8, there exist C 2 , x 2 such that for all x with z x /2 > x 2 , implying the desired lower bound from (6.2). . Because X ∈ QDHR with exponent γ , there exists a strictly increasing function m : R + → R + such that for all sizes x, We have r M-Gittins ( x ) ≤ 1/h( x ) by (6.1), and Lemma 6.1 implies r Gittins (z x ) = r M-Gittins ( x ), so It remains only to lower bound r Gittins (z x ). We do so using the observation that for any age a, where the first inequality follows from viewing the ratio of integrals as a weighted average. Plugging in a = z x implies m(z x ) ≤ m(O( γ x )), so the result follows because m is strictly increasing.

Time per Completion Lemmas
L 6.7. For all sizes x and ages a, if x < a < z x , then

P . A property of the Gittins index [10, Lemma 2.2] implies 19
In particular, for any a z x , Plugging in d = x , e = a, and f = z x and applying (6.3) yields η( x , z x ) ≥ η(a, z x ), as desired.
Because X ∈ OR(−∞, −1), there exist β > 1 and C 0 , x 0 > 0 such that for all t > a > x 0 , . 19 The proof given by Gittins et al. [10] is in a discrete setting, but essentially the same proof carries over to our continuous setting.
For all b > a > x 0 , we have We now consider two cases: and therefore In the second case β − 1 ∈ (0, 1), so we can use the identity which holds for any u > 0. Substituting u = a/b gives us b a −(β −1) and therefore η(a, b) ≥ C 0 a 1 − a b .

HEAVY-TRAFFIC SCALING OF M/G/1 WAITING AND RESIDENCE TIMES
In this section we characterize the heavy-traffic scaling of mean waiting, residence, and inflated residence times, which are the M/G/1 quantities that appear Theorem 5.1. Because M-SERPT is a simpler policy than M-Gittins, our approach is to first study M-SERPT's heavy-traffic scaling (Sections 7.2 and 7.3) then show that the results extend to M-Gittins (Section 7.4).

Key Parts of Waiting and Residence Time Formulas
Before starting the heavy-traffic analyses of M-Gittins and M-SERPT, we use this section to introduce some new notation. Let Definition 7.1.
The key M/G/1 response time quantities, or simply "key quantities", of a monotonic SOAP policy π are the following: dx, dx, dx, When the policy in question is unambiguous, we drop the superscript π .
In Theorems B.1-B.3 (Appendix B) we show that for any monotonic SOAP policy π , Bounding mean waiting, residence, and inflated residence times thus amounts to bounding the key quantities.
For the most of the rest of this section we focus on the case where π is M-SERPT, deferring the M-Gittins case to Section 7.4. Until then, x , z x , and the key quantities are understood to have an implicit superscript M-SERPT.
The most important step of bounding the key quantities is bounding H ρ ( x ) and H ρ (z x ). As a first step, we bound H ρ (x). Let be the tail of the excess of X . We can write ρ(x) as This means that for all ε ∈ [0, 1], we have where . This bound is useful because it separates H ρ (x)'s dependence on x and ρ: the numerator depends only on x, and the denominator depends only on ρ. We will typically choose ε to be either 0 or arbitrarily small. Having bounded H ρ (x) in (7.3), we now turn to bounding H ρ ( x ) and H ρ (z x ). We first observe that, recalling the definition of r SERPT from Section 2.1, Lemma 6.1 and the monotonicity of r M-SERPT imply Combining this with (7.3) yields bounds on H ρ ( x ) and H ρ (z x ), though the bounds still have F ( x ) and F (z x ) terms. To better understand H ρ ( x ) and H ρ (z x ), we need to use arguments that depends on what class of distributions contains X . We do this over the course of the next few sections.

Infinite-Variance Job Size Distributions
In this section we study the heavy-traffic scaling of M-SERPT's waiting, residence, and inflated residence times for infinite-variance job size distributions, specifically those in OR(−2, −1). With that said, many of the intermediate results we prove will also be useful for the finite-variance OR(−∞, −2) case (Section 7.3).
Suppose that X ∈ OR(−∞, −1). Combining Theorem 6.2 and (7.4) gives us This alone is enough to bound all of the key quantities except I Q .
Our approach is to use the fact that, by (4.5), It therefore suffices to show that the integrands of the key quantities are all O(H ρ (x)). It suffices to consider II Q , II S , and III S because II R = II S and III R ≤ III S . We begin by showing that III S 's integrand is O(H ρ (x)). By (7.5) and the fact that X ∈ OR(−∞, −1), we have F ( x ) = F (Θ(x)) = Θ(F (x)), which yields This implies the desired bound for III S and III R . We show II S 's integrand is O(H ρ (x)) by applying (7.3) with ε = 0 and (7.5): This implies the desired bound for II S and II R . A very similar computation bounds II Q .
It remains only to characterize the heavy-traffic scaling of I Q . Treating the OR(−∞, −2) case requires some additional care, so we defer it to Section 7.3, focusing on the OR(−2, −1) case for now. The first step is to bound τ (x).
We now have bounds on every term in I Q 's integrand, allowing us to bound I Q and thereby mean response time.

Finite-Variance Job Size Distributions
We now turn to finite-variance job size distributions, specifically those in OR(−∞, −2), MDA(Λ), and ENBUE. We begin with the simplest case, which is ENBUE.
T 7.5. If X ∈ ENBUE, then in the ρ → 1 limit, and therefore If additionally X ∈ Bounded, then in the ρ → 1 limit,

P
. Let x max be the supremum of X 's support, so we may have x max = ∞. Because X ∈ ENBUE, there exists age a * < x max such that This means • x ≤ a * for all sizes x, • z x ≤ a * for all sizes x ≤ a * , and • z x = x max for all sizes x > a * .
If additionally X ∈ Bounded, then x max < ∞, so We now turn to the OR(−∞, −2) and MDA(Λ) cases, which require the following technical lemma. One implication of Lemma 7.6 is that if X ∈ MDA(Λ), then We are now ready to tackle the OR(−∞, −2) and MDA(Λ) cases. As in Section 7.2, we begin by bounding the five key quantities other than I Q . Lemma 7.2 does so for OR(−∞, −2), and the following lemma does so for MDA(Λ).
If additionally X ∈ MDA(Λ) ∩ (QDHR ∪ QIMRL), then . Our overall approach is to use (7.3) on each key quantity to bound it by an expression of the form The challenge is then to show that the integral converges for arbitrarily small ε > 0.
It remains only to characterize the heavy-traffic scaling of I Q . .

P
. Because E[X 2 ] < ∞, we have τ (x) = Θ(1), so by (7.3) and (7.4), For the lower bound, we integrate up to Using this fact along with the monotonicity of r M-SERPT yields .
For the upper bound, we split the integration region at F −1 e (1 − ρ): The second part of the integral is simple to bound using the monotonicity of r M-SERPT : [by (4.5), (7.2)] .
To bound the first part of the integral, we change variables to u = 1/F e (x): where L M-SERPT is as in Lemma 7. Having characterized the heavy-traffic scaling of all the key quantities, the main heavy-traffic results for OR(−∞, −2) and MDA(Λ) follow easily. T 7.9. If X ∈ OR(−∞, −2), then in the ρ → 1 limit, and therefore . P . After applying Lemmas 7.2 and 7.8, it remains only to show I Q = Ω((1 − ρ) −δ ). Using L M-SERPT from Lemma 7.6, we can rewrite Lemma 7.8 as and therefore .

Extending Heavy-Traffic Analysis from M-SERPT to Gi ins and M-Gi ins
Having characterized heavy-traffic scaling under M-SERPT, we now do the same for Gittins and M-Gittins. Our first result shows that the mean waiting and residence times of Gittins and M-Gittins have the same heavy-traffic scaling as that of M-SERPT. Note that the precondition holds for all of the job size distributions we consider in Section 7.3. 20 T 7.11. In the ρ → 1 limit,  P . The proof is very similar to the proofs of analogous results for M-SERPT (Theorems 7.5, 7.9 and 7.10), so we just describe the differences.
• If X ∈ OR(−∞, −2), we follow the same proof as Theorem 7.9 and the lemmas it requires, except we use Theorem 6.4 to bound M-Gittins • If X ∈ Bounded, we follow the same proof as Theorem 7.5, except we use a result of Aalto et al. [4,Proposition 9] to justify the existence of the critical age a * .

CONCLUSION
We study optimal scheduling in the M/G/k to minimize mean response time. This problem is solved by the Gittins policy for the single-server k = 1 case but was previously open for the much more difficult multiserver case. We introduce A natural question to ask is whether the conditions under which we prove M-Gittins's optimality can be relaxed, particularly the QDHR and Bounded assumptions. The difficulty lies in the fact that for some job size distributions, the bound in Theorem 5.1 is not strong enough because inflated residence time is infinite. It is possible that the techniques used by Köllerström [16,17] to analyze the heavy-traffic M/G/k under FCFS could be helpful, seeing as FCFS has infinite inflated residence time.
Another major open question is analyzing the performance of M-Gittins outside of the heavytraffic limit. In the single-server case, one can generalize the techniques of Scully et al. [27] to show that M-Gittins is a 3-approximation for M/G/1 mean response time at all loads. However, the multiserver case remains open. We now reconsider the same example from Fig. A.1 but with k ≥ 2 servers. The key difference is that because there are multiple servers, ′ can receive service even while has better rank because and ′ can occupy different servers simultaneously. This means ′ no longer gets stuck at age π x . In particular, if reaches age c and ′ passes age b, then ′ contributes relevant work between ages b and c. Therefore, s π -k x = c > s π -1 x for k ≥ 2. The bound in Theorem 5.1 follows from assuming that every new job ′ will contribute relevant work until it completes or reaches age s π -k x . This is a worst-case estimate, because the tagged job might complete before ′ completes or reaches age s π -k x . When π is monotonic, we have s π -k x = s π -1 x , so this overestimate is tight enough to compare the mean response times under π -k and π -1. However, when π is nonmonotonic, it may be that s π -k x > s π -1 x , as explained above, so we do not obtain a tight comparison between the π -k and π -1 systems. This suggests generalizing Theorem 5.1 to nonmonotonic SOAP policies requires not relying as heavily on worst-case quantities like s π -k x .

B NEW FORMULAS FOR MEAN WAITING AND RESIDENCE TIMES
In this appendix we prove the following new formulas for mean waiting, residence, and inflated residence times.
T B.1. Under any monotonic SOAP policy π, Under any monotonic SOAP policy π, dx .
Proving these results requires new technical machinery for, roughly speaking, performing integration by parts on expressions involving π x and z π x , such as those in (4.4). Appendix B.1 introduces the general technical machinery, which Appendix B.2 then applies to prove the above results.
Throughout this appendix, ∂ denotes the derivative operator, and [t 1 , . . . , t n → RHS] denotes the function that maps variables t 1 , . . . , t n to expression RHS.

B.1 Integration by Parts with Hills and Valleys
Definition B.4. A hill-valley partition of R + is a sequence For compactness, we write x = (x) and z x = z(x).
It is simple to check that for any monotonic SOAP policy π , the pair π , z π (Definition 4.1) is a hill-valley pair.
Definition B.6. For functions Φ : R + → R + , we define the difference ratio operator ∆ as follows: where ∂ is the derivative operator. Similarly, for functions with multiple arguments, ∆ i is a version of ∆ that works on the ith argument: Like ∂, it is easily seen that ∆ is a linear operator. When applied to polynomials, ∆ elegantly generalizes ∂. For example, The ∆ operator also obeys various chain-rule-like identities. We highlight the two we use below.
P . If u = v, this is the multivariable chain rule. If u v, The most important result of this appendix is the following lemma, which formulates a version of integration by parts that works for hill-valley pairs despite their discontinuity. L B.9. Let , z be a hill-valley pair, Φ : R 3 + → R be differentiable, P : R + → R be differentiable, and P(x) = c − P(x) for some c ∈ R. If For each hill (v, u], Summing the hill and valley expressions over all hills and valleys, most of the non-integral terms cancel out, and the two that remain are 0 by assumption: Our final two lemmas show that integrals of ∆ applications can sometimes be turned into integrals of ∂ applications.
L B.10. Let , z be a hill-valley pair and Φ : R 3 + → R + be differentiable with respect to its second argument. Then For each hill (v, u], Summing the hill and valley expressions over all hills and valleys yields the desired result.
L B.11. Let , z be a hill-valley pair and both Φ : R 3 + → R and Ψ : R + → R be differentiable.

B.2 Proofs of New Formulas
We now apply the theory developed in Appendix B.1 to prove Theorems B.1-B.3. Throughout the proofs, x and z x refer to π x and z π x , respectively. Recall that , z form a hill-valley pair (Definition B.5) under any monotonic SOAP policy π . [by Lem. B.10] which equals the desired result by (4.5).

P T B.2.
We compute  which equals the desired result by (4.5).